When do reduce tasks start in Hadoop?

Question

In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?

Frankie · Answer

The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.The reduce phase can start long before a reducer is called. As soon as "a" mapper finishes the job, the generated data undergoes some sorting and shuffling (which includes call to combiner and partitioner). The reducer "phase" kicks in the moment post mapper data processing is started. As these processing is done, you will see progress in reducers percentage. However, none of the reducers have been called in yet.You can customize when the reducers startup by changing the default value of&#160;mapred.reduce.slowstart.completed.maps&#160;in&#160;mapred-site.xml. A value of&#160;1.00&#160;will wait for all the mappers to finish before starting the reducers. A value of&#160;0.0&#160;will start the reducers right away. A value of&#160;0.5&#160;will start the reducers when half of the mappers are complete. You can also change&#160;mapred.reduce.slowstart.completed.maps&#160;on a job-by-job basis.&#160;In new versions of Hadoop (at least 2.4.1) the parameter is called is&#160;mapreduce.job.reduce.slowstart.completedmaps.I hope this answer helps :)

When do reduce tasks start in Hadoop

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

When do Reduce tasks start in Hadoop?

When is the reduce tasks start in Hadoop?

I was installing Hadoop for windows everything went smoothly but when I tried to format NameNode it displayed an error of failing to start NameNode. What should I do?

Not able to start Job History Server in Hadoop 2.8.1

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

How do I print hadoop properties in command line?

What is the best functional language to do Hadoop Map-Reduce?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES