How to format the output being written by MapReduce in Hadoop?

0 votes

I am trying to reverse the contents of the file by each word. I have the program running fine, but the output i am getting is something like this

1   dwp
2   seviG
3   eht
4   tnerruc
5   gnikdrow
6   yrotcerid
7   ridkm
8   desU
9   ot
10  etaerc

I want the output to be something like this

dwp seviG eht tnerruc gnikdrow yrotcerid ridkm desU
ot etaerc

The code i am working with

    import java.io.IOException;
    import java.util.*;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class Reproduce {

    public static int temp =0;
    public static class ReproduceMap extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text>{
        private Text word = new Text();
        @Override
        public void map(LongWritable arg0, Text value,
                OutputCollector<IntWritable, Text> output, Reporter reporter)
                throws IOException {
            String line = value.toString().concat("\n");
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(new StringBuffer(tokenizer.nextToken()).reverse().toString());
                temp++;
                output.collect(new IntWritable(temp),word);
              }

        }

    }

    public static class ReproduceReduce extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text>{

        @Override
        public void reduce(IntWritable arg0, Iterator<Text> arg1,
                OutputCollector<IntWritable, Text> arg2, Reporter arg3)
                throws IOException {
            String word = arg1.next().toString();
            Text word1 = new Text();
            word1.set(word);
            arg2.collect(arg0, word1);

        }

    }

    public static void main(String[] args) throws Exception {
    JobConf conf = new JobConf(WordCount.class);
    conf.setJobName("wordcount");

    conf.setOutputKeyClass(IntWritable.class);
    conf.setOutputValueClass(Text.class);

    conf.setMapperClass(ReproduceMap.class);
    conf.setReducerClass(ReproduceReduce.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));

    JobClient.runJob(conf);

  }
}

How do i modify my output instead of writing another java program to do that?

Sep 5, 2018 in Big Data Hadoop by Neha
• 6,180 points
60 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Here is a simple code demonstrate the use of custom FileoutputFormat

public class MyTextOutputFormat extends FileOutputFormat<Text, List<IntWritable>> {
      @Override
      public org.apache.hadoop.mapreduce.RecordWriter<Text, List<Intwritable>> getRecordWriter(TaskAttemptContext arg0) throws IOException, InterruptedException {
         //get the current path
         Path path = FileOutputFormat.getOutputPath(arg0);
         //create the full path with the output directory plus our filename
         Path fullPath = new Path(path, "result.txt");
     //create the file in the file system
     FileSystem fs = path.getFileSystem(arg0.getConfiguration());
     FSDataOutputStream fileOut = fs.create(fullPath, arg0);

     //create our record writer with the new file
     return new MyCustomRecordWriter(fileOut);
  }
}

public class MyCustomRecordWriter extends RecordWriter<Text, List<IntWritable>> {
    private DataOutputStream out;

    public MyCustomRecordWriter(DataOutputStream stream) {
        out = stream;
        try {
            out.writeBytes("results:\r\n");
        }
        catch (Exception ex) {
        }  
    }

    @Override
    public void close(TaskAttemptContext arg0) throws IOException, InterruptedException {
        //close our file
        out.close();
    }

    @Override
    public void write(Text arg0, List arg1) throws IOException, InterruptedException {
        //write out our key
        out.writeBytes(arg0.toString() + ": ");
        //loop through all values associated with our key and write them with commas between
        for (int i=0; i<arg1.size(); i++) {
            if (i>0)
                out.writeBytes(",");
            out.writeBytes(String.valueOf(arg1.get(i)));
        }
        out.writeBytes("\r\n");  
    }
}

Finally, we need to tell our job about our output format and the path before running it.

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(ArrayList.class);
job.setOutputFormatClass(MyTextOutputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("/home/hadoop/out"))
answered Sep 5, 2018 by Frankie
• 9,710 points

Related Questions In Big Data Hadoop

0 votes
1 answer

In Hadoop MapReduce, how can i set an Object as the Value for Map output?

Try this and see if it works: public ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 66,990 points
21 views
0 votes
1 answer

Hadoop: How to get the column name along with the output in Hive?

You can get the column names by ...READ MORE

answered Nov 20, 2018 in Big Data Hadoop by Omkar
• 66,990 points
106 views
0 votes
1 answer

How to compress output of the mapreduce output in Hive?

To compress the output of the MapReduce ...READ MORE

answered May 20 in Big Data Hadoop by Hiran
8 views
0 votes
1 answer

How to see the history of Job output-dir in MapReduce?

Hi, You can use this command to the ...READ MORE

answered Jun 14 in Big Data Hadoop by Gitika
• 15,910 points
16 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,570 points
1,866 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,304 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
674 views
0 votes
1 answer
0 votes
1 answer

What is the Data format and database choices in Hadoop and Spark?

Use Parquet. I'm not sure about CSV ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,710 points
44 views
0 votes
1 answer

What is the difference between Hadoop MapReduce and built-in MapReduce?

Differences are as follows: Hadoop's MR can be ...READ MORE

answered Sep 11, 2018 in Big Data Hadoop by Frankie
• 9,710 points
102 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.