How to implement data locality in Hadoop MapReduce?

0 votes

As Hadoop follows data locality. So, I am trying to understand which component in Hadoop handles data locality.

When I went through some tutorials I understood that while submitting a job, it enquires the NameNode to find out which block resides in which DataNode and then it tries to execute the map tasks on those nodes.

My doubt is when a developer creates a custom input format are they also responsible for implementing data locality?

Apr 20, 2018 in Big Data Hadoop by Shubham
• 13,290 points
52 views

1 answer to this question.

0 votes

You can use this getFileBlockLocations method of Filesystem class which will return an array containing hostname, offsets & size of portions of the given file.

BlockLocation[] blkLoc = fs.getFileBlockLocations(file, 0, length);

You call this method inside JobClient. The results are written in a SequenceFile. The ResourceManager later reads the file while initializing the job.
 

When you are extending (inheriting) FileInputFormat you will return a list of InputSplit. While initializing the job it is important to specify the location of Input Splits.

public FileSplit(Path file, long start, long length, String[] hosts)

You are not implementing data locality, but you are telling the where the input splits can be found. 

answered Apr 20, 2018 by kurt_cobain
• 9,240 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to format the output being written by MapReduce in Hadoop?

Here is a simple code demonstrate the ...READ MORE

answered Sep 5, 2018 in Big Data Hadoop by Frankie
• 9,810 points
103 views
0 votes
1 answer

What is Modeling data in Hadoop and how to do it?

I suggest spending some time with Apache ...READ MORE

answered Sep 19, 2018 in Big Data Hadoop by Frankie
• 9,810 points
88 views
0 votes
1 answer

Hadoop Hive: How to insert data in Hive table?

First, copy data into HDFS. Then create ...READ MORE

answered Nov 12, 2018 in Big Data Hadoop by Omkar
• 67,480 points
628 views
0 votes
2 answers

Hey for all, how to get on large data i want use in hadoop?

Hi, To work with Hadoop you can also ...READ MORE

answered Jul 30 in Big Data Hadoop by Sunny
44 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,729 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,515 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
989 views
0 votes
1 answer
0 votes
1 answer

How to configure secondary namenode in Hadoop 2.x ?

bin/hadoop-daemon.sh start [namenode | secondarynamenode | datanode ...READ MORE

answered Apr 6, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
374 views