How to implement data locality in Hadoop MapReduce?

0 votes

As Hadoop follows data locality. So, I am trying to understand which component in Hadoop handles data locality.

When I went through some tutorials I understood that while submitting a job, it enquires the NameNode to find out which block resides in which DataNode and then it tries to execute the map tasks on those nodes.

My doubt is when a developer creates a custom input format are they also responsible for implementing data locality?

Apr 20, 2018 in Big Data Hadoop by Shubham
• 12,890 points
22 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

You can use this getFileBlockLocations method of Filesystem class which will return an array containing hostname, offsets & size of portions of the given file.

BlockLocation[] blkLoc = fs.getFileBlockLocations(file, 0, length);

You call this method inside JobClient. The results are written in a SequenceFile. The ResourceManager later reads the file while initializing the job.
 

When you are extending (inheriting) FileInputFormat you will return a list of InputSplit. While initializing the job it is important to specify the location of Input Splits.

public FileSplit(Path file, long start, long length, String[] hosts)

You are not implementing data locality, but you are telling the where the input splits can be found. 

answered Apr 20, 2018 by kurt_cobain
• 9,260 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to format the output being written by MapReduce in Hadoop?

Here is a simple code demonstrate the ...READ MORE

answered Sep 5, 2018 in Big Data Hadoop by Frankie
• 9,710 points
59 views
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,710 points
50 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 66,910 points
153 views
0 votes
1 answer

Hey for all, how to get on large data i want use in hadoop?

Hey! You can get large data-sets for ...READ MORE

answered Apr 24 in Big Data Hadoop by Ariba
29 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,490 points
1,847 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,174 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
660 views
0 votes
1 answer
0 votes
1 answer

How to configure secondary namenode in Hadoop 2.x ?

bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
274 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.