Reading the file and populating the associative array

I want to populate an associative array in order to perform a map-side join. So I put this information in a text file, place that file into the DistributedCache and read it in Mapper before any records are processed.
Which method in the Mapper should I use to implement code for reading the file and populating the associative array?

Aug 2, 2018 in Big Data Hadoop by Meci Matt
You can use the configure method to implement code for reading the file and populate the associative array.

Below is one example on how to use distributed cache for your reference:

// Setting up the cache for the application 
1. Copy the requisite files to the FileSystem: 
$ bin/hadoop fs -copyFromLocal lookup.dat /myapp/lookup.dat  

$ bin/hadoop fs -copyFromLocal /myapp/  

$ bin/hadoop fs -copyFromLocal mylib.jar /myapp/mylib.jar 

$ bin/hadoop fs -copyFromLocal mytar.tar /myapp/mytar.tar 

$ bin/hadoop fs -copyFromLocal mytgz.tgz /myapp/mytgz.tgz 

$ bin/hadoop fs -copyFromLocal mytargz.tar.gz /myapp/mytargz.tar.gz 
2. Setup the application's JobConf: 
JobConf job = new JobConf(); 

DistributedCache.addCacheFile(new URI("/myapp/lookup.dat#lookup.dat"),  job); 

DistributedCache.addCacheArchive(new URI("/myapp/", job); 

DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job); 

DistributedCache.addCacheArchive(new URI("/myapp/mytar.tar", job); 

DistributedCache.addCacheArchive(new URI("/myapp/mytgz.tgz", job); 

DistributedCache.addCacheArchive(new URI("/myapp/mytargz.tar.gz", job); 
3. Use the cached files in the Mapper or Reducer: 
public static class MapClass extends MapReduceBase  implements Mapper<K, V, K, V> { 
private Path[] localArchives; 

private Path[] localFiles; 
public void configure(JobConf job) { 

// Get the cached archives/files localArchives = DistributedCache.getLocalCacheArchives(job); localFiles = DistributedCache.getLocalCacheFiles(job); 

public void map(K key, V value,  OutputCollector<K, V> output, Reporter reporter)  throws IOException { 

// Use data from the cached archives/files here 

// ... // ... output.collect(k, v); } } 

Hope this helps.

answered Aug 2, 2018 by nitinrawat895
0 votes
1 answer
