How to work with distributed cache in Hadoop

0 votes

I am trying to implement distributed cache in my MapReduce program. In the main method I am adding the cache files.

Configuration conf = new Configuration();

Job job = new Job(conf, "example");

DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);

/user/vinay/card.txt file exists in my hdfs.

I am referring to this file in the setup method:

public void setup(Context context) throws IOException, InterruptedException{

    Configuration conf = context.getConfiguration();

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

}

 The cacheFiles array is always getting a null value. First, I tried running it on single node Hadoop cluster, but then I read somewhere that it prevents distributed cache working. Then I tried executing this code in pseudo-distributed mode, but then also it is not working.

Apr 20, 2018 in Big Data Hadoop by Shubham
• 13,490 points
1,172 views

1 answer to this question.

0 votes

The problem with your code is that you are first creating conf object and then you are creating the job and passing the conf as parameter. So, afterwards when you are loading the file in the distributed cache. It is not reflected in the job.

Instead, first try creating conf object, then add the distributed cache and at last cerate the job.

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);
Job job = new Job(conf, "example");
answered Apr 20, 2018 by kurt_cobain
• 9,390 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop: How to get the column name along with the output in Hive?

You can get the column names by ...READ MORE

answered Nov 21, 2018 in Big Data Hadoop by Omkar
• 69,210 points
4,612 views
0 votes
1 answer

How to run Nutch in Hadoop installed in pseudo-distributed mode

Make sure you have built Nutch from ...READ MORE

answered Jan 24, 2019 in Big Data Hadoop by Frankie
• 9,830 points
769 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,080 points
920 views
0 votes
1 answer

How to run Hadoop in Docker containers?

Hi, You can run Hadoop in Docker container. Follow ...READ MORE

answered Jan 24, 2020 in Big Data Hadoop by MD
• 95,440 points
1,821 views
0 votes
1 answer

What is the function of getLocalCacheArchives method?

We use distributed cache to share those ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 13,490 points
545 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,617 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,928 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,293 views
0 votes
1 answer

How to practice programming with Hadoop?

Well there are multiple ways to solve ...READ MORE

answered Mar 30, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,668 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP