Setting the Number of Map tasks and Reduce Tasks not Working

Question

I am trying to run a mapreduce job with a specified number of map task which I am passing as an argument. Following is the command that I am running:

hadoop jar test.jar Test /sample/input/ /sample/output \ -D mapred.map.tasks = 10

Here, I have fixed the number of map task to be 10 but, I am still able to see that the job is running with higher number of map task than what I have specified in the command. I have checked the command as well and it seems fine to me. Please tell me what I am doing wrong?

Ashish · Answer 1 · May 4, 2018

The command that you are running is correct. The reason your mr job is running with higher number of map tasks because of the number of input splits. Basically, hadoop does not allow you to specify the number of map task and is governed by the number of input splits. The number that you pass using mapred.tasks.parameter just gives a suggestion to hadoop framework about the number of maps. Hence, in any MR job regardless of the number of map task specified, a map task will always be spawned for each input split and eventually, the number of map tasks is equal to the number of input splits.