InputSplit vs HDFS Block

0 votes
What is the fundamental difference between a MapReduce InputSplit and HDFS block?
Jun 1, 2018 in Big Data Hadoop by shams
• 3,660 points
2,406 views

1 answer to this question.

0 votes
By definition

Block – Block is the continuous location on the hard drive where data HDFS store data. In general, FileSystem store data as a collection of blocks. In a similar way, HDFS stores each file as blocks, and distributes it across the Hadoop cluster.
InputSplit- InputSplit represents the data which individual Mapper will process. Further split divides into records. Each record (which is a key-value pair) will be processed by the map.
Data representation

Block- It is the physical representation of data.
InputSplit- It is the logical representation of data. Thus, during data processing in MapReduce program or other processing techniques use InputSplit. In MapReduce, important thing is that InputSplit does not contain the input data. Hence, it is just a reference to the data.
Size

Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default.

Hope it helps
answered Jun 1, 2018 by kurt_cobain
• 9,390 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is a HDFS block sequential ?

It seems like you are confused between the ...READ MORE

answered May 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
514 views
0 votes
1 answer

How Hadoop distributes block writes into HDFS?

So, what happens is the slave node ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
130 views
0 votes
1 answer

what are the typicall block sizes in HDFS

HDFS is a block structured file system ...READ MORE

answered Apr 8, 2019 in Big Data Hadoop by Gitika
• 65,870 points
544 views
0 votes
1 answer
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,595 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
6,842 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
48,239 views
0 votes
1 answer
0 votes
1 answer

How does the HDFS Client knows the block size while writing?

HDFS is designed in a way where ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
207 views
0 votes
1 answer

Block Scanner HDFS

Block scanner runs periodically on every DataNode ...READ MORE

answered Jul 31, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
548 views