Hadoop Mapreduce word count Program

0 votes
I am unable to run the wordcount prog using MapReduce.
Need help.
Mar 15, 2018 in Data Analytics by kurt_cobain
• 9,260 points
1,684 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Firstly you need to understand the concept of mapreduce. It can be understood very easily by the following images

Traditional Way:

image

MapReduce Way:

image

Program:
package name_of_your_package

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration conf=new Configuration();
String[] files=new GenericOptionsParser(conf,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job a=new Job(conf,"wordcount");
a.setJarByClass(WordCount.class);
a.setMapperClass(MapForWordCount.class);
a.setReducerClass(ReduceForWordCount.class);
a.setOutputKeyClass(Text.class);
a.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(a, input);
FileOutputFormat.setOutputPath(a, output);
System.exit(a.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
     Text outputKey = new Text(word.toUpperCase().trim());
 IntWritable outputValue = new IntWritable(1);
 con.write(outputKey, outputValue);
}
}
}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException
{
int sum = 0;
  for(IntWritable value : values)
  {
  sum += value.get();
  }
  con.write(word, new IntWritable(sum));
}
}
}

answered Mar 16, 2018 by nitinrawat895
• 9,070 points

Related Questions In Data Analytics

0 votes
1 answer

how do I run this R program on hadoop.

You can use hadoop streaming to read ...READ MORE

answered Nov 16, 2018 in Data Analytics by Maverick
• 10,000 points
43 views
+1 vote
1 answer

Need a hadoop engine in backend to run r server

Dear Koushik, Hope you are doing great. The hadoop ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
26 views
0 votes
1 answer

How to sync Hadoop configuration files to multiple nodes?

Dear Raman, Hope you are doing great. Please accept ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,610 points
36 views
0 votes
1 answer

Hadoop Streaming job vs regular jobs?

In certain cases, Hadoop Streaming is beneficial ...READ MORE

answered Mar 21, 2018 in Data Analytics by kurt_cobain
• 9,260 points
20 views
0 votes
1 answer

How to count unique values in R?

You can get the information printed in ...READ MORE

answered Apr 9, 2018 in Data Analytics by darklord
• 6,140 points
676 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
8,183 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
578 views
0 votes
1 answer
+1 vote
1 answer

Where to set hadoop.tmp.dir? core-site.xml or hdfs-site.xml?

hadoop.tmp.dir (A base for other temporary directories) is ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 9,070 points
1,291 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,070 points
132 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.