Hadoop Mapreduce word count Program

+1 vote
I am unable to run the wordcount prog using MapReduce.
Need help.
Mar 16, 2018 in Data Analytics by kurt_cobain
• 9,390 points
10,521 views

1 answer to this question.

0 votes

Firstly you need to understand the concept of mapreduce. It can be understood very easily by the following images

Traditional Way:

image

MapReduce Way:

image

Program:
package name_of_your_package

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration conf=new Configuration();
String[] files=new GenericOptionsParser(conf,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job a=new Job(conf,"wordcount");
a.setJarByClass(WordCount.class);
a.setMapperClass(MapForWordCount.class);
a.setReducerClass(ReduceForWordCount.class);
a.setOutputKeyClass(Text.class);
a.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(a, input);
FileOutputFormat.setOutputPath(a, output);
System.exit(a.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
     Text outputKey = new Text(word.toUpperCase().trim());
 IntWritable outputValue = new IntWritable(1);
 con.write(outputKey, outputValue);
}
}
}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException
{
int sum = 0;
  for(IntWritable value : values)
  {
  sum += value.get();
  }
  con.write(word, new IntWritable(sum));
}
}
}

answered Mar 16, 2018 by nitinrawat895
• 11,380 points

Related Questions In Data Analytics

0 votes
1 answer

how do I run this R program on hadoop.

You can use hadoop streaming to read ...READ MORE

answered Nov 16, 2018 in Data Analytics by Maverick
• 10,840 points
703 views
+2 votes
1 answer

Need a hadoop engine in backend to run r server

Dear Koushik, Hope you are doing great. The hadoop ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,610 points
578 views
0 votes
1 answer

How to sync Hadoop configuration files to multiple nodes?

Dear Raman, Hope you are doing great. Please accept ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,610 points
769 views
0 votes
1 answer

Hadoop Streaming job vs regular jobs?

In certain cases, Hadoop Streaming is beneficial ...READ MORE

answered Mar 22, 2018 in Data Analytics by kurt_cobain
• 9,390 points
572 views
0 votes
2 answers

How to count unique values in R?

You can try this way, as.data.frame(v) %>% count(v) READ MORE

answered Aug 8, 2019 in Data Analytics by anonymous
6,247 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
103,803 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,232 views
0 votes
1 answer
+1 vote
1 answer

Where to set hadoop.tmp.dir? core-site.xml or hdfs-site.xml?

hadoop.tmp.dir (A base for other temporary directories) is ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
8,434 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,165 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP