How to work with distributed cache in Hadoop

Question

I am trying to implement distributed cache in my MapReduce program. In the main method I am adding the cache files.

Configuration conf = new Configuration();

Job job = new Job(conf, "example");

DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);

/user/vinay/card.txt file exists in my hdfs.

I am referring to this file in the setup method:

public void setup(Context context) throws IOException, InterruptedException{

    Configuration conf = context.getConfiguration();

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

}

The cacheFiles array is always getting a null value. First, I tried running it on single node Hadoop cluster, but then I read somewhere that it prevents distributed cache working. Then I tried executing this code in pseudo-distributed mode, but then also it is not working.

kurt_cobain · Answer 1 · Apr 20, 2018

The problem with your code is that you are first creating conf object and then you are creating the job and passing the conf as parameter. So, afterwards when you are loading the file in the distributed cache. It is not reflected in the job.

Instead, first try creating conf object, then add the distributed cache and at last cerate the job.

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);
Job job = new Job(conf, "example");

answered Apr 20, 2018 by kurt_cobain
• 9,350 points

How to work with distributed cache in Hadoop

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Hadoop: How to get the column name along with the output in Hive?

How to run Nutch in Hadoop installed in pseudo-distributed mode

How to get started with Hadoop?

How to run Hadoop in Docker containers?

What is the function of getLocalCacheArchives method?

Hadoop Mapreduce word count Program

hadoop fs -put command?

Hadoop dfs -ls command?

How to practice programming with Hadoop?

How to retrieve the list of sql (Hive QL) commands that has been executed in a hadoop cluster?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES