InputSplit vs HDFS Block

0 votes
What is the fundamental difference between a MapReduce InputSplit and HDFS block?
Jun 1, 2018 in Big Data Hadoop by shams
• 3,580 points
562 views

1 answer to this question.

0 votes
By definition

Block – Block is the continuous location on the hard drive where data HDFS store data. In general, FileSystem store data as a collection of blocks. In a similar way, HDFS stores each file as blocks, and distributes it across the Hadoop cluster.
InputSplit- InputSplit represents the data which individual Mapper will process. Further split divides into records. Each record (which is a key-value pair) will be processed by the map.
Data representation

Block- It is the physical representation of data.
InputSplit- It is the logical representation of data. Thus, during data processing in MapReduce program or other processing techniques use InputSplit. In MapReduce, important thing is that InputSplit does not contain the input data. Hence, it is just a reference to the data.
Size

Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default.

Hope it helps
answered Jun 1, 2018 by kurt_cobain
• 9,240 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is a HDFS block sequential ?

It seems like you are confused between the ...READ MORE

answered May 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
85 views
0 votes
1 answer

How Hadoop distributes block writes into HDFS?

So, what happens is the slave node ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
29 views
0 votes
1 answer

what are the typicall block sizes in HDFS

HDFS is a block structured file system ...READ MORE

answered Apr 8 in Big Data Hadoop by Gitika
• 19,720 points
35 views
0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 15, 2018 in Big Data Hadoop by nitinrawat895
• 10,110 points
68 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
765 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,110 points
2,053 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
10,502 views
0 votes
1 answer
0 votes
1 answer

How does the HDFS Client knows the block size while writing?

HDFS is designed in a way where ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
25 views
0 votes
1 answer

Block Scanner HDFS

Block scanner runs periodically on every DataNode ...READ MORE

answered Jul 31, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
87 views