How to work with distributed cache in Hadoop?

0 votes

I am trying to implement distributed cache in my MapReduce program. In the main method I am adding the cache files.

Configuration conf = new Configuration();

Job job = new Job(conf, "example");

DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);

/user/vinay/card.txt file exists in my hdfs.

I am referring to this file in the setup method:

public void setup(Context context) throws IOException, InterruptedException{

    Configuration conf = context.getConfiguration();

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

}

 The cacheFiles array is always getting a null value. First, I tried running it on single node Hadoop cluster, but then I read somewhere that it prevents distributed cache working. Then I tried executing this code in pseudo-distributed mode, but then also it is not working.

Apr 20, 2018 in Big Data Hadoop by Shubham
• 12,150 points
296 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

The problem with your code is that you are first creating conf object and then you are creating the job and passing the conf as parameter. So, afterwards when you are loading the file in the distributed cache. It is not reflected in the job.

Instead, first try creating conf object, then add the distributed cache and at last cerate the job.

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);
Job job = new Job(conf, "example");
answered Apr 20, 2018 by kurt_cobain
• 9,260 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop: How to get the column name along with the output in Hive?

You can get the column names by ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 65,850 points
73 views
0 votes
1 answer

How to run Nutch in Hadoop installed in pseudo-distributed mode

Make sure you have built Nutch from ...READ MORE

answered Jan 24 in Big Data Hadoop by Frankie
• 9,570 points
15 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 1,980 points
40 views
0 votes
0 answers

How to run Hadoop in Docker containers?

I want to incorporate Hadoop in Docker ...READ MORE

Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 9,030 points
47 views
0 votes
1 answer

What is the function of getLocalCacheArchives method?

We use distributed cache to share those ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 12,150 points
20 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,030 points
1,656 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,029 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
560 views
0 votes
1 answer

How to practice programming with Hadoop?

Well there are multiple ways to solve ...READ MORE

answered Mar 29, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
97 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.