How input splits are done when 2 blocks are spread across different nodes

0 votes
Say, I have a large file that's broken into two HDFS blocks and the blocks are physically saved into 2 different machines. Consider there is no such node in the cluster that locally hosts both the blocks. As I understood in case of TextInputFormat HDFS block size is normally same as the split size. Now since there are 2 splits, 2 map instances will be spawned in 2 separate machines which locally hold the blocks. Now assume that the HDFS text file had been broken in middle of a line to form the blocks. Would hadoop now copy block 2 from 2nd machine into the first machine so it could provide the first line(broken half) from 2nd block to complete the last broken line of the first block?
Dec 7, 2020 in Big Data Hadoop by Rajiv
• 8,890 points
186 views

1 answer to this question.

0 votes

Hadoop doesn't copy the blocks to the node running the map task, the blocks are streamed from the data node to the task node (with some sensible transfer block size such as 4kb). So in the example you give, the map task that processed the first block will read the entire first block, and then stream read the second block until it finds the end of line character. So it's probably 'mostly' local.

How much of the second block is read depends on how long the line is - it's entirely possible that a file split over 3 blocks will be processed by 3 map tasks, with the second map task essentially processing no records (but reading all the data from block 2 and some of 3) if a line starts in block 1 and ends in block 3.

Hope this makes sense

answered Dec 7, 2020 by Gitika
• 65,950 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How can I get the respective Bitcoin value for an input in USD when using c#

Simply make call to server and parse ...READ MORE

answered Mar 25, 2018 in Big Data Hadoop by charlie_brown
• 7,780 points
301 views
0 votes
1 answer

How to analyze block placement on datanodes and rebalancing data across Hadoop nodes?

HDFS provides a tool for administrators i.e. ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
495 views
0 votes
1 answer

How are blocks created while written file in hdfs?

Suppose we want to write a 1 ...READ MORE

answered Dec 21, 2018 in Big Data Hadoop by Omkar
• 69,170 points
492 views
0 votes
1 answer

How can Hadoop process the records that are split across the block boundaries?

First of all, Map Reduce algorithm is not programmed ...READ MORE

answered Apr 15, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,554 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
7,817 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
61,222 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,952 views
0 votes
1 answer
0 votes
1 answer

What are different types of blocks in Hbase?

Hey, Block is a single smallest amount / ...READ MORE

answered May 21, 2019 in Big Data Hadoop by Gitika
• 65,950 points
112 views
0 votes
1 answer

How data distribution is done in Hadoop?

To understand how or what are the process ...READ MORE

answered Apr 4, 2019 in Big Data Hadoop by Gitika
• 65,950 points
852 views