Hadoop Mapreduce word count Program

0 votes
I am unable to run the wordcount prog using MapReduce.
Need help.
Mar 15, 2018 in Data Analytics by kurt_cobain
• 9,240 points
3,003 views

1 answer to this question.

0 votes

Firstly you need to understand the concept of mapreduce. It can be understood very easily by the following images

Traditional Way:

image

MapReduce Way:

image

Program:
package name_of_your_package

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration conf=new Configuration();
String[] files=new GenericOptionsParser(conf,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job a=new Job(conf,"wordcount");
a.setJarByClass(WordCount.class);
a.setMapperClass(MapForWordCount.class);
a.setReducerClass(ReduceForWordCount.class);
a.setOutputKeyClass(Text.class);
a.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(a, input);
FileOutputFormat.setOutputPath(a, output);
System.exit(a.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
     Text outputKey = new Text(word.toUpperCase().trim());
 IntWritable outputValue = new IntWritable(1);
 con.write(outputKey, outputValue);
}
}
}

public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedException
{
int sum = 0;
  for(IntWritable value : values)
  {
  sum += value.get();
  }
  con.write(word, new IntWritable(sum));
}
}
}

answered Mar 16, 2018 by nitinrawat895
• 10,670 points

Related Questions In Data Analytics

0 votes
1 answer

how do I run this R program on hadoop.

You can use hadoop streaming to read ...READ MORE

answered Nov 16, 2018 in Data Analytics by Maverick
• 10,040 points
104 views
+1 vote
1 answer

Need a hadoop engine in backend to run r server

Dear Koushik, Hope you are doing great. The hadoop ...READ MORE

answered Dec 17, 2017 in Data Analytics by Sudhir
• 1,610 points
63 views
0 votes
1 answer

How to sync Hadoop configuration files to multiple nodes?

Dear Raman, Hope you are doing great. Please accept ...READ MORE

answered Dec 18, 2017 in Data Analytics by Sudhir
• 1,610 points
75 views
0 votes
1 answer

Hadoop Streaming job vs regular jobs?

In certain cases, Hadoop Streaming is beneficial ...READ MORE

answered Mar 21, 2018 in Data Analytics by kurt_cobain
• 9,240 points
55 views
0 votes
2 answers

How to count unique values in R?

You can try this way, as.data.frame(v) %>% count(v) READ MORE

answered Aug 8 in Data Analytics by anonymous
1,510 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,836 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
1,101 views
0 votes
1 answer
+1 vote
1 answer

Where to set hadoop.tmp.dir? core-site.xml or hdfs-site.xml?

hadoop.tmp.dir (A base for other temporary directories) is ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
1,942 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
334 views