Hadoop: method to send output to multiple directories

0 votes

I am using MultipleOutputs class to create 12 outputs using the following function in the driver:

public void createOutputs(){
    Calendar c = Calendar.getInstance();
    String monthStr, pathStr;

    // Create multiple outputs for last 12 months
    // TODO make 12 configurable
    for(int i = 0; i < 12; ++i ){
        //Get month and add 1 as month is 0 based index
        int month = c.get(Calendar.MONTH)+1; 
        //Add leading 0
        monthStr = month > 10 ? "" + month : "0" + month ;  
        // Generate path string in the format 2013/03/etl
        pathStr = c.get(Calendar.YEAR) + "" + monthStr + "etl";
        // Add the named output
        MultipleOutputs.addNamedOutput(config, pathStr );  
        // Move to previous month
        c.add(Calendar.MONTH, -1); 
    }
}

In the reducer, I added a cleanup function to move the generated output to appropriate directories.

protected void cleanup(Context context) throws IOException, InterruptedException {
        // Custom function to recursively process data
        moveFiles (FileSystem.get(new Configuration()), new Path("/MyOutputPath"));
}
Nov 13, 2018 in Big Data Hadoop by digger
• 26,550 points
102 views

1 answer to this question.

0 votes

setup:

MultipleOutputs.addNamedOutput(job, "Output", TextOutputFormat.class, Text.class, Text.class);

setup of reducer:

mout = new MultipleOutputs<Text, Text>(context);

calling mout in reducer:

String key; //set to whatever output key will be
String value; //set to whatever output value will be
String outputFileName; //set to absolute path to file where this should write

mout.write("Output",new Text(key),new Text(value),outputFileName);

you can have a piece of code determine the directory while coding. For example say you want to specify directory by month and year:

int year;//extract year from data
int month;//extract month from data
String baseFileName; //parent directory to all outputs from this job
String outputFileName = baseFileName + "/" + year + "/" + month;

mout.write("Output",new Text(key),new Text(value),outputFileName);

Hope this helps.

answered Nov 13, 2018 by Omkar
• 67,660 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to sync Hadoop configuration files to multiple nodes?

For syncing Hadoop configuration files, you have ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by HackTheCode
197 views
0 votes
1 answer

How can we send data from MongoDB to Hadoop?

The MongoDB Connector for Hadoop reads data ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,710 points
152 views
0 votes
1 answer

Is there any way to setup Hadoop nodes (data nodes/namenodes) to use multiple volumes/disks?

Datanodes can store blocks in multiple directories ...READ MORE

answered Jun 20, 2018 in Big Data Hadoop by nitinrawat895
• 10,710 points
177 views
0 votes
1 answer

How to format the output being written by MapReduce in Hadoop?

Here is a simple code demonstrate the ...READ MORE

answered Sep 5, 2018 in Big Data Hadoop by Frankie
• 9,810 points
136 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,303 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
391 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,285 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,186 views
0 votes
3 answers

Hadoop Spark: How to iterate hdfs directories?

import org.apache.hadoop.fs.{FileSystem,Path} FileSystem.get( sc.hadoopConfiguration ).listStatus( new Path("hdfs:///tmp")).foreach( ...READ MORE

answered Dec 4, 2018 in Big Data Hadoop by Komal
2,521 views
0 votes
1 answer

Hadoop: How to Group mongodb - mapReduce output?

db.order.mapReduce(function() { emit (this.customer,{count:1,orderDate:this.orderDate.interval_start}) }, function(key,values){ var category; ...READ MORE

answered Oct 31, 2018 in Big Data Hadoop by Omkar
• 67,660 points
93 views