How to implement data locality in Hadoop MapReduce

Question

As Hadoop follows data locality. So, I am trying to understand which component in Hadoop handles data locality.

When I went through some tutorials I understood that while submitting a job, it enquires the NameNode to find out which block resides in which DataNode and then it tries to execute the map tasks on those nodes.

My doubt is when a developer creates a custom input format are they also responsible for implementing data locality?

kurt_cobain · Answer 1 · Apr 20, 2018

You can use this getFileBlockLocations method of Filesystem class which will return an array containing hostname, offsets & size of portions of the given file.

BlockLocation[] blkLoc = fs.getFileBlockLocations(file, 0, length);

You call this method inside JobClient. The results are written in a SequenceFile. The ResourceManager later reads the file while initializing the job.

When you are extending (inheriting) FileInputFormat you will return a list of InputSplit. While initializing the job it is important to specify the location of Input Splits.

public FileSplit(Path file, long start, long length, String[] hosts)

You are not implementing data locality, but you are telling the where the input splits can be found.

answered Apr 20, 2018 by kurt_cobain
• 9,350 points

How to implement data locality in Hadoop MapReduce

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

How to format the output being written by MapReduce in Hadoop?

What is Modeling data in Hadoop and how to do it?

Hadoop Hive: How to insert data in Hive table?

Hey for all, how to get on large data i want use in hadoop?

Hadoop Mapreduce word count Program

hadoop fs -put command?

Hadoop dfs -ls command?

Is there a way to copy data from one one Hadoop distributed file system(HDFS) to another HDFS?

How to retrieve the list of sql (Hive QL) commands that has been executed in a hadoop cluster?

How to configure secondary namenode in Hadoop 2.x ?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES