InputSplit vs HDFS Block

0 votes
What is the fundamental difference between a MapReduce InputSplit and HDFS block?
Jun 1, 2018 in Big Data Hadoop by shams
• 3,580 points
468 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
By definition

Block – Block is the continuous location on the hard drive where data HDFS store data. In general, FileSystem store data as a collection of blocks. In a similar way, HDFS stores each file as blocks, and distributes it across the Hadoop cluster.
InputSplit- InputSplit represents the data which individual Mapper will process. Further split divides into records. Each record (which is a key-value pair) will be processed by the map.
Data representation

Block- It is the physical representation of data.
InputSplit- It is the logical representation of data. Thus, during data processing in MapReduce program or other processing techniques use InputSplit. In MapReduce, important thing is that InputSplit does not contain the input data. Hence, it is just a reference to the data.
Size

Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default.

Hope it helps
answered Jun 1, 2018 by kurt_cobain
• 9,260 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Is a HDFS block sequential ?

It seems like you are confused between the ...READ MORE

answered May 21, 2018 in Big Data Hadoop by nitinrawat895
• 9,350 points
71 views
0 votes
1 answer

How Hadoop distributes block writes into HDFS?

So, what happens is the slave node ...READ MORE

answered Aug 16, 2018 in Big Data Hadoop by nitinrawat895
• 9,350 points
23 views
0 votes
1 answer

what are the typicall block sizes in HDFS

HDFS is a block structured file system ...READ MORE

answered Apr 8 in Big Data Hadoop by Gitika
• 13,870 points
26 views
0 votes
1 answer

How can I download only hdfs and not hadoop?

No, you cannot download HDFS alone because ...READ MORE

answered Mar 15, 2018 in Big Data Hadoop by nitinrawat895
• 9,350 points
55 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
647 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,350 points
1,825 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,028 views
0 votes
1 answer
0 votes
1 answer

How does the HDFS Client knows the block size while writing?

HDFS is designed in a way where ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
16 views
0 votes
1 answer

Block Scanner HDFS

Block scanner runs periodically on every DataNode ...READ MORE

answered Jul 31, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
75 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.