InputSplit vs HDFS Block

0 votes
What is the fundamental difference between a MapReduce InputSplit and HDFS block?
Jun 1, 2018 in Big Data Hadoop by shams
• 3,670 points
4,436 views

1 answer to this question.

0 votes
By definition

Block – Block is the continuous location on the hard drive where data HDFS store data. In general, FileSystem store data as a collection of blocks. In a similar way, HDFS stores each file as blocks, and distributes it across the Hadoop cluster.
InputSplit- InputSplit represents the data which individual Mapper will process. Further split divides into records. Each record (which is a key-value pair) will be processed by the map.
Data representation

Block- It is the physical representation of data.
InputSplit- It is the logical representation of data. Thus, during data processing in MapReduce program or other processing techniques use InputSplit. In MapReduce, important thing is that InputSplit does not contain the input data. Hence, it is just a reference to the data.
Size

Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default.

Hope it helps
answered Jun 1, 2018 by kurt_cobain
• 9,350 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is a HDFS block sequential ?

It seems like you are confused between the ...READ MORE

answered May 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,622 views
0 votes
1 answer

How Hadoop distributes block writes into HDFS?

So, what happens is the slave node ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
583 views
+1 vote
1 answer

what are the typicall block sizes in HDFS

HDFS is a block structured file system ...READ MORE

answered Apr 8, 2019 in Big Data Hadoop by Gitika
• 65,770 points
1,753 views
0 votes
1 answer
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,639 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,072 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,054 views
0 votes
1 answer
0 votes
1 answer

How does the HDFS Client knows the block size while writing?

HDFS is designed in a way where ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
1,115 views
0 votes
1 answer

Block Scanner HDFS

Block scanner runs periodically on every DataNode ...READ MORE

answered Jul 31, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
2,477 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP