Published on Feb 12,2015
Email Post

This post contains an ebook on how to do word count using MRv2 classes and API. This ebook provides a complete and detailed guide to write you first MRv2 Program and execute it on YARN in HADOOP 2.0 Do  refer to our earlier post on Hadoop 2.0 and YARN and  Setting up clusters in Hadoop 2.0 for more information. This program allows you to perform word count using MRv2 classes in Hadoop 2.0

Let’s start off with a brief look at the changes made to MapReduce in this version.

Changes in MRv2

The MapReduce has undergone a complete overhaul in Hadoop 0.23 and is now called as MRv2. MRv2 aka Hadoop YARN (Yet Another Resource Negotiator) is the next generation MapReduce in Apache Hadoop. The major change is that now, there is no JobTracker and TaskTracker. It was the job of JobTracker to organize the MapReduce jobs across the cluster and schedule and monitor it. With the advent of YARN inside Apache Hadoop, JobTracker is no longer needed to schedule Jobs and manage it, and TaskTrackers to carryout tasks.

The fundamental job of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM).

The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The NodeManager is the per-machine framework agent which is responsible for containers, monitoring resource usage and reporting the same to the ResourceManager/Scheduler. MRv2 has a Resource Manager for each cluster, and each data node runs a Node Manager. For each job, one slave node will act as the Application Master, monitoring resources, etc.

The per-application ApplicationMaster is a framework specific library that is tasked with assigning resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.

MRV2 maintains API compatibility with previous stable release (hadoop-0.20.205), meaning that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.

Here is a Guide on “Word Count using MRv2 Classes and API- A Guide to Understand the Execution of Word Count in Hadoop 2.0”

Word count using MRv2

Got a question for us? Mention them in the comments section and we will get back to you. 

Related Posts:

How to run Hive Scripts

Apache Hive Installation on Ubuntu

Get started with Big Data and Hadoop

Share on

Browse Categories

1 Comment