How to work with distributed cache in Hadoop

0 votes

I am trying to implement distributed cache in my MapReduce program. In the main method I am adding the cache files.

Configuration conf = new Configuration();

Job job = new Job(conf, "example");

DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);

/user/vinay/card.txt file exists in my hdfs.

I am referring to this file in the setup method:

public void setup(Context context) throws IOException, InterruptedException{

    Configuration conf = context.getConfiguration();

    Path[] cacheFiles = DistributedCache.getLocalCacheFiles(conf);

}

 The cacheFiles array is always getting a null value. First, I tried running it on single node Hadoop cluster, but then I read somewhere that it prevents distributed cache working. Then I tried executing this code in pseudo-distributed mode, but then also it is not working.

Apr 20, 2018 in Big Data Hadoop by Shubham
• 13,480 points
857 views

1 answer to this question.

0 votes

The problem with your code is that you are first creating conf object and then you are creating the job and passing the conf as parameter. So, afterwards when you are loading the file in the distributed cache. It is not reflected in the job.

Instead, first try creating conf object, then add the distributed cache and at last cerate the job.

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/vinay/card.txt"), conf);
Job job = new Job(conf, "example");
answered Apr 20, 2018 by kurt_cobain
• 9,390 points

Related Questions In Big Data Hadoop

0 votes
1 answer

Hadoop: How to get the column name along with the output in Hive?

You can get the column names by ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 69,150 points
3,278 views
0 votes
1 answer

How to run Nutch in Hadoop installed in pseudo-distributed mode

Make sure you have built Nutch from ...READ MORE

answered Jan 24, 2019 in Big Data Hadoop by Frankie
• 9,810 points
377 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,070 points
421 views
0 votes
1 answer

How to run Hadoop in Docker containers?

Hi, You can run Hadoop in Docker container. Follow ...READ MORE

answered Jan 24, 2020 in Big Data Hadoop by MD
• 95,300 points
680 views
0 votes
1 answer

What is the function of getLocalCacheArchives method?

We use distributed cache to share those ...READ MORE

answered Apr 29, 2018 in Big Data Hadoop by Shubham
• 13,480 points
135 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
7,802 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
60,764 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,939 views
0 votes
1 answer

How to practice programming with Hadoop?

Well there are multiple ways to solve ...READ MORE

answered Mar 29, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
1,329 views