InputSplit vs HDFS Block

0 votes
What is the fundamental difference between a MapReduce InputSplit and HDFS block?
Jun 1, 2018 in Big Data Hadoop by shams
• 3,660 points
2,846 views

1 answer to this question.

0 votes
By definition

Block – Block is the continuous location on the hard drive where data HDFS store data. In general, FileSystem store data as a collection of blocks. In a similar way, HDFS stores each file as blocks, and distributes it across the Hadoop cluster.
InputSplit- InputSplit represents the data which individual Mapper will process. Further split divides into records. Each record (which is a key-value pair) will be processed by the map.
Data representation

Block- It is the physical representation of data.
InputSplit- It is the logical representation of data. Thus, during data processing in MapReduce program or other processing techniques use InputSplit. In MapReduce, important thing is that InputSplit does not contain the input data. Hence, it is just a reference to the data.
Size

Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default.

Hope it helps
answered Jun 1, 2018 by kurt_cobain
• 9,390 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is a HDFS block sequential ?

It seems like you are confused between the ...READ MORE

answered May 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
638 views
0 votes
1 answer

How Hadoop distributes block writes into HDFS?

So, what happens is the slave node ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
171 views
0 votes
1 answer

what are the typicall block sizes in HDFS

HDFS is a block structured file system ...READ MORE

answered Apr 8, 2019 in Big Data Hadoop by Gitika
• 65,950 points
721 views
0 votes
1 answer
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,992 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
7,893 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
62,528 views
0 votes
1 answer
0 votes
1 answer

How does the HDFS Client knows the block size while writing?

HDFS is designed in a way where ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
273 views
0 votes
1 answer

Block Scanner HDFS

Block scanner runs periodically on every DataNode ...READ MORE

answered Jul 31, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
746 views