How hadoop mapreduce job is submitted to worker nodes

Question

I have started learning hadoop and its already been 2 weeks but, I am still not able to understand many things in Hadoop mapreduce. One of the concept that I am finding very difficult is how job gets distributed to worker nodes from master node? Suppose, I have hadoop cluster with 1 master and 3 slave nodes. Now, when a client is submitting job how does it gets passed to master and then on to the slave nodes?

Ashish · Answer 1 · Mar 30, 2018

Alright, I think you are basically looking for the entire workflow of the a mapreduce job. Here is the picture that will help you understand how things happen:

At first, you have your MR code i.e. the jar of the job which you submit for the client node using hadoop jar command. Here you pass all the details such as the class name, input path and output path.
Now, i once your job has been submitted the Resource Manager will assign a new application id to this job which will be then passed on to the client.
Client will copy the jar file and other job resources to HDFS. Basically, client submits the job through Resource Manager.
Resource Manager, being master node, allocate the resources needed for the job to run and keeps track of cluster utilization. It also, initiates an application master for each job who is responsible to co-ordinate the job execution.
Application master gets the meta data info from namenode to determine where the blocks (input split) for input are located and then supervises the respective nodemanagers to submit the tasks
Basically, the App Master creates a map task object for each input split, as well as a number of reduce task objects