Multiple Output format in Hadoop

0 votes

I'm new to Hadoop. 

I am trying the Wordcount program in Hadoop

Now to try out multiple output files, I use MultipleOutputFormat. this link helped me in doing it. 

I have the following code in my driver class.

    MultipleOutputs.addNamedOutput(conf, "even",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);

    MultipleOutputs.addNamedOutput(conf, "odd",
            org.apache.hadoop.mapred.TextOutputFormat.class, Text.class,
            IntWritable.class);`

The Reducer class is as follows

public static class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {
    MultipleOutputs mos = null;

    public void configure(JobConf job) {
        mos = new MultipleOutputs(job);
    }

    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        if (sum % 2 == 0) {
            mos.getCollector("even", reporter).collect(key, new IntWritable(sum));
        }else {
            mos.getCollector("odd", reporter).collect(key, new IntWritable(sum));
        }
        //output.collect(key, new IntWritable(sum));
    }
    @Override
    public void close() throws IOException {
        // TODO Auto-generated method stub
    mos.close();
    }
}

It  worked, but I get LOT of files, (one odd and one even for every map-reduce)

My Query is, How can I have just 2 output files so that every odd output of every map-reduce gets written into that odd file and same for even.

Jul 26, 2019 in Big Data Hadoop by nitinrawat895
• 11,380 points
1,739 views

1 answer to this question.

0 votes
Each reducer uses an OutputFormat to write records. So you are getting a set of odd and even files per reducer. This is by design so that each reducer can perform writes in parallel.

If you want just a single odd and single even file, you'll need to set mapred.reduce.tasks to 1. But performance will suffer because all the mappers will be feeding into a single reducer.

Another option is to change the process the reads these files to accept multiple input files or write a separate process that merges these files together.
answered Jul 26, 2019 by ravikiran
• 4,620 points

Related Questions In Big Data Hadoop

0 votes
1 answer

How to format the output being written by MapReduce in Hadoop?

Here is a simple code demonstrate the ...READ MORE

answered Sep 5, 2018 in Big Data Hadoop by Frankie
• 9,830 points
2,274 views
0 votes
1 answer

Getting error in Hadoop: Output file already exist

When you executed your code earlier, you ...READ MORE

answered Apr 19, 2018 in Big Data Hadoop by Shubham
• 13,490 points
8,052 views
0 votes
1 answer

What is the Data format and database choices in Hadoop and Spark?

Use Parquet. I'm not sure about CSV ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,830 points
723 views
0 votes
1 answer

Hadoop: method to send output to multiple directories

setup: MultipleOutputs.addNamedOutput(job, "Output", TextOutputFormat.class, Text.class, Text.class); setup of reducer: mout ...READ MORE

answered Nov 13, 2018 in Big Data Hadoop by Omkar
• 69,210 points
740 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,604 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,208 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,783 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,286 views
0 votes
1 answer

Mutliple Output Format in Hadoop

Each reducer uses an OutputFormat to write ...READ MORE

answered Jul 19, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
441 views
0 votes
1 answer

How does Hadoop process data which is split across multiple boundaries in an HDFS?

I found some comments: from the Hadoop ...READ MORE

answered Jul 1, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
731 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP