Explanation to outputcollector vs context

Hi Team,

I need to understand the difference between writing a mapper code in bellow following manner.


Public static class wordcountmapper extends MapReduceBase implements Mapper<>{

Public void map(LongWritable key, Text values, OutputCollector<Text, Text> outout) 


Public static class wordcountmapper extends Mapper<> {

Public void map(Longwritable key, Text values, Context context) 

I want to understand the concept of this OutoutCollector and Context. Where I have to use which syntax? Etc etc.

Jul 26, 2019 in Big Data Hadoop by Jai

1 answer to this question.

Both codes contain different API of Map Reduce.OutputCollector is in MRV1 and Context is in MRV2

The Java Map Reduce API 1 also known as MRV1 was released with initial Hadoop versions and the flaw associated with these initial versions was mapreduce framework performing both the task of processing and resource management.

MapReduce 2 or the Next Generation MapReduce, was a long-awaited and much-needed upgrade to the techniques concerned with scheduling, resource management, and the execution occurring in Hadoop. Fundamentally, the improvements separate cluster resource management capabilities from MapReduce-specific logic and this separation of processing and resource management were achieved via inception of YARN in later versions of HADOOP.

MRV1 uses OutputCollecter and Reporter to communicate with the MapReduce system.

MRV2 uses API to make extensive use of context objects that allow the user code to communicate with the MapReduce system. (The role of the JobConf, the OutputCollector, and the Reporter from the old API is unified by Contexts objects in MRV2).

You should use mapreduce 2 (MRV2). I have highlighted Hadoop 2's biggest advantages over Hadoop:​

  1. One major advantage is, there are no job trackers and task trackers in the hadoop2 architecture. We have YARN resource manager and node manager instead. This helps hadoop2 support other models apart from the mapreduce framework to execute the code and overcome high latency problems associated with mapreduce.
  2. Hadoop2 supports non-batch processing along with traditional batch operations.
  3. Hdfs federation is introduced in hadoop2. This enables multiple namenodes to control Hadoop cluster which tries to handle a single point failure problem of Hadoop.
answered Jul 26, 2019 by Rasheed

