Explanation to outputcollector vs context

0 votes

Hi Team,

I need to understand the difference between writing a mapper code in bellow following manner.

Code:1 

Public static class wordcountmapper extends MapReduceBase implements Mapper<>{

Public void map(LongWritable key, Text values, OutputCollector<Text, Text> outout) 

Code:2

Public static class wordcountmapper extends Mapper<> {

Public void map(Longwritable key, Text values, Context context) 

I want to understand the concept of this OutoutCollector and Context. Where I have to use which syntax? Etc etc.

Jul 26, 2019 in Big Data Hadoop by Jai
151 views

1 answer to this question.

0 votes

Both codes contain different API of Map Reduce.OutputCollector is in MRV1 and Context is in MRV2

The Java Map Reduce API 1 also known as MRV1 was released with initial Hadoop versions and the flaw associated with these initial versions was mapreduce framework performing both the task of processing and resource management.

MapReduce 2 or the Next Generation MapReduce, was a long-awaited and much-needed upgrade to the techniques concerned with scheduling, resource management, and the execution occurring in Hadoop. Fundamentally, the improvements separate cluster resource management capabilities from MapReduce-specific logic and this separation of processing and resource management were achieved via inception of YARN in later versions of HADOOP.

MRV1 uses OutputCollecter and Reporter to communicate with the MapReduce system.

MRV2 uses API to make extensive use of context objects that allow the user code to communicate with the MapReduce system. (The role of the JobConf, the OutputCollector, and the Reporter from the old API is unified by Contexts objects in MRV2).

You should use mapreduce 2 (MRV2). I have highlighted Hadoop 2's biggest advantages over Hadoop:​

  1. One major advantage is, there are no job trackers and task trackers in the hadoop2 architecture. We have YARN resource manager and node manager instead. This helps hadoop2 support other models apart from the mapreduce framework to execute the code and overcome high latency problems associated with mapreduce.
  2. Hadoop2 supports non-batch processing along with traditional batch operations.
  3. Hdfs federation is introduced in hadoop2. This enables multiple namenodes to control Hadoop cluster which tries to handle a single point failure problem of Hadoop.
answered Jul 26, 2019 by Rasheed

Related Questions In Big Data Hadoop

0 votes
1 answer

How to sync Hadoop configuration files to multiple nodes?

For syncing Hadoop configuration files, you have ...READ MORE

answered Jun 21, 2018 in Big Data Hadoop by HackTheCode
361 views
+1 vote
2 answers

Failed to restart Hadoop namenode using cloudera quickstart

You can use Cloudera Manager to manage ...READ MORE

answered Mar 19, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points

edited Jun 8 by MD 1,575 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,050 points
233 views
0 votes
1 answer

How to run Hadoop in Docker containers?

Hi, You can run Hadoop in Docker container. Follow ...READ MORE

answered Jan 24 in Big Data Hadoop by MD
• 42,420 points
371 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
5,508 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
813 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
34,136 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
2,074 views
0 votes
3 answers

How to change the delimiter in Sqoop?

--fields-terminated-by <char> READ MORE

answered Jun 25, 2019 in Big Data Hadoop by anonymous
4,049 views
0 votes
2 answers

How to see MySql service is running or not using linux command?

Hi, You can simply run the following commands ...READ MORE

answered Jan 21 in Big Data Hadoop by anonymous
116 views