How to implement data locality in Hadoop MapReduce

0 votes

As Hadoop follows data locality. So, I am trying to understand which component in Hadoop handles data locality.

When I went through some tutorials I understood that while submitting a job, it enquires the NameNode to find out which block resides in which DataNode and then it tries to execute the map tasks on those nodes.

My doubt is when a developer creates a custom input format are they also responsible for implementing data locality?

Apr 20, 2018 in Big Data Hadoop by Shubham
• 13,490 points
705 views

1 answer to this question.

0 votes

You can use this getFileBlockLocations method of Filesystem class which will return an array containing hostname, offsets & size of portions of the given file.

BlockLocation[] blkLoc = fs.getFileBlockLocations(file, 0, length);

You call this method inside JobClient. The results are written in a SequenceFile. The ResourceManager later reads the file while initializing the job.
 

When you are extending (inheriting) FileInputFormat you will return a list of InputSplit. While initializing the job it is important to specify the location of Input Splits.

public FileSplit(Path file, long start, long length, String[] hosts)

You are not implementing data locality, but you are telling the where the input splits can be found. 

answered Apr 20, 2018 by kurt_cobain
• 9,390 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to format the output being written by MapReduce in Hadoop?

Here is a simple code demonstrate the ...READ MORE

answered Sep 5, 2018 in Big Data Hadoop by Frankie
• 9,830 points
2,270 views
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,830 points
1,568 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 69,210 points
9,459 views
0 votes
2 answers

Hey for all, how to get on large data i want use in hadoop?

Hi, To work with Hadoop you can also ...READ MORE

answered Jul 30, 2019 in Big Data Hadoop by Sunny
839 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,598 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,694 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,283 views
0 votes
1 answer
0 votes
1 answer

How to configure secondary namenode in Hadoop 2.x ?

bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
1,539 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP