How input splits are done when 2 blocks are spread across different nodes

0 votes
Say, I have a large file that's broken into two HDFS blocks and the blocks are physically saved into 2 different machines. Consider there is no such node in the cluster that locally hosts both the blocks. As I understood in case of TextInputFormat HDFS block size is normally same as the split size. Now since there are 2 splits, 2 map instances will be spawned in 2 separate machines which locally hold the blocks. Now assume that the HDFS text file had been broken in middle of a line to form the blocks. Would hadoop now copy block 2 from 2nd machine into the first machine so it could provide the first line(broken half) from 2nd block to complete the last broken line of the first block?
Dec 7, 2020 in Big Data Hadoop by Rajiv
• 8,870 points
1,383 views

1 answer to this question.

0 votes

Hadoop doesn't copy the blocks to the node running the map task, the blocks are streamed from the data node to the task node (with some sensible transfer block size such as 4kb). So in the example you give, the map task that processed the first block will read the entire first block, and then stream read the second block until it finds the end of line character. So it's probably 'mostly' local.

How much of the second block is read depends on how long the line is - it's entirely possible that a file split over 3 blocks will be processed by 3 map tasks, with the second map task essentially processing no records (but reading all the data from block 2 and some of 3) if a line starts in block 1 and ends in block 3.

Hope this makes sense

answered Dec 7, 2020 by Gitika
• 65,770 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How can I get the respective Bitcoin value for an input in USD when using c#

Simply make call to server and parse ...READ MORE

answered Mar 25, 2018 in Big Data Hadoop by charlie_brown
• 7,720 points
1,080 views
0 votes
1 answer

How to analyze block placement on datanodes and rebalancing data across Hadoop nodes?

HDFS provides a tool for administrators i.e. ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,001 views
0 votes
1 answer

How are blocks created while written file in hdfs?

Suppose we want to write a 1 ...READ MORE

answered Dec 21, 2018 in Big Data Hadoop by Omkar
• 69,220 points
1,434 views
0 votes
1 answer

How can Hadoop process the records that are split across the block boundaries?

First of all, Map Reduce algorithm is not programmed ...READ MORE

answered Apr 15, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
3,895 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,077 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
109,076 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,350 points
4,642 views
0 votes
1 answer
0 votes
1 answer

What are different types of blocks in Hbase?

Hey, Block is a single smallest amount / ...READ MORE

answered May 21, 2019 in Big Data Hadoop by Gitika
• 65,770 points
636 views
0 votes
1 answer

How data distribution is done in Hadoop?

To understand how or what are the process ...READ MORE

answered Apr 4, 2019 in Big Data Hadoop by Gitika
• 65,770 points
2,017 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP