Reading the file and populating the associative array

0 votes
I want to populate an associative array in order to perform a map-side join. So I put this information in a text file, place that file into the DistributedCache and read it in Mapper before any records are processed.
Which method in the Mapper should I use to implement code for reading the file and populating the associative array?

Can someone help?

Thanks in advance!
Aug 2, 2018 in Big Data Hadoop by Meci Matt
• 9,460 points
945 views

1 answer to this question.

0 votes

You can use the configure method to implement code for reading the file and populate the associative array.

Below is one example on how to use distributed cache for your reference:

// Setting up the cache for the application 
 
1. Copy the requisite files to the FileSystem: 
 
$ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat  

$ bin/hadoop fs -copyFromLocal map.zip /myapp/map.zip  

$ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar 

$ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar 

$ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz 

$ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz 
 
2. Setup the application's JobConf: 
 
JobConf job = new JobConf(); 

DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),  job); 

DistributedCache.addCacheArchive(new URI("/myapp/map.zip", job); 

DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); 

DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); 

DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); 

DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job); 
 
3. Use the cached files in the Mapper or Reducer: 
 
public static class MapClass extends MapReduceBase  implements Mapper<K, V, K, V> { 
 
private Path[] localArchives; 

private Path[] localFiles; 
 
public void configure(JobConf job) { 

// Get the cached archives/files localArchives = DistributedCache.getLocalCacheArchives(job); localFiles = DistributedCache.getLocalCacheFiles(job); 


 
public void map(K key, V value,  OutputCollector<K, V> output, Reporter reporter)  throws IOException { 

// Use data from the cached archives/files here 

// ... // ... output.collect(k, v); } } 

Hope this helps.
 

answered Aug 2, 2018 by nitinrawat895
• 11,380 points

Related Questions In Big Data Hadoop

0 votes
1 answer

What is the difference between local file system commands touch and touchz?

Actually they both do the same except touchz is ...READ MORE

answered Aug 14, 2018 in Big Data Hadoop by Frankie
• 9,830 points
2,148 views
0 votes
1 answer

Change the owner and group of a file in Hadoop.

Hi@akhtar, You can use the Chown command. This ...READ MORE

answered Oct 1, 2020 in Big Data Hadoop by MD
• 95,440 points
7,619 views
0 votes
1 answer

The file exists before processing with hadoop command

Took session and it got resolved. READ MORE

answered Dec 18, 2017 in Big Data Hadoop by Sudhir
• 1,610 points
821 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,296 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,618 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,080 points
921 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,943 views
0 votes
1 answer

How to create a FileSystem object that can be used for reading from and writing to HDFS?

Read operation on HDFS In order to read ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points

edited Mar 22, 2018 by nitinrawat895 2,682 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP