Hadoop Job Tacker
Job Tracker is the master daemon for both Job resource management and scheduling/monitoring of jobs. It acts as a liaison between Hadoop and your application.
The user first copies files in to the Distributed File System (DFS), before submitting a job to the client. The client then receives these input files. The user will receive the splits or blocks based on the input files. The client could create the splits or blocks in a manner it prefers, as there are certain considerations behind it. If an analysis is done on the complete data, you will divide the data into splits. Files are not copied through client, but are copied using flume or Sqoop or any external client.
Once the files are copied in to the DFS and the client interacts with the DFS, the splits will run a MapReduce job. The job is submitted through a job tracker. The job tracker is the master daemon which runs on the same node that runs these multiple jobs on data nodes. This data will be lying on various data nodes but it is the responsibility of the job tracker to take care of that.
After a client submits on the job tracker, the job is initialized on the job queue and the job tracker creates maps and reduces. Based on the program that is contained in the map function and reduce function, it will create the map task and reduce task. These two will run on the input splits. Note: When created by the clients, this input split contains the whole data.
Each input split has a map job running in it and the output of the map task goes into the reduce task . Job tracker runs the track on a particular data. There can be multiple replications of that so it picks the local data and runs the task on that particular task tracker. The task tracker is the one that actually runs the task on the data node. Job tracker will pass the information to the task tracker and the task tracker will run the job on the data node.
Once the job has been assigned to the task tracker, there is a heartbeat associated with each task tracker and job tracker. It sends signals to find out if the data nodes are still alive. The two are often in sync since there is a possibility for the nodes to fade out.
Got a question for us? Mention them in the comments section and we will get back to you.