InputSplit vs HDFS Block

1 answer to this question.

By definition

Block – Block is the continuous location on the hard drive where data HDFS store data. In general, FileSystem store data as a collection of blocks. In a similar way, HDFS stores each file as blocks, and distributes it across the Hadoop cluster.
InputSplit- InputSplit represents the data which individual Mapper will process. Further split divides into records. Each record (which is a key-value pair) will be processed by the map.
Data representation

Block- It is the physical representation of data.
InputSplit- It is the logical representation of data. Thus, during data processing in MapReduce program or other processing techniques use InputSplit. In MapReduce, important thing is that InputSplit does not contain the input data. Hence, it is just a reference to the data.
Size

Block- The default size of the HDFS block is 128 MB which is configured as per our requirement. All blocks of the file are of the same size except the last block. The last Block can be of same size or smaller. In Hadoop, the files split into 128 MB blocks and then stored into Hadoop Filesystem.
InputSplit- Split size is approximately equal to block size, by default.

Hope it helps

InputSplit vs HDFS Block

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Is a HDFS block sequential ?

How Hadoop distributes block writes into HDFS?

what are the typicall block sizes in HDFS

INFO hdfs.DFSClient: Could not obtain block blk_-from any node: java.io.IOException: No live nodes contain current block

Hadoop dfs -ls command?

Hadoop Mapreduce word count Program

hadoop fs -put command?

Is there a way to copy data from one one Hadoop distributed file system(HDFS) to another HDFS?

How does the HDFS Client knows the block size while writing?

Block Scanner HDFS

Subscribe to our Newsletter, and get personalized recommendations.