When do reduce tasks start in Hadoop

0 votes
In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?
Jul 27, 2018 in Big Data Hadoop by Frankie
• 9,830 points
815 views

1 answer to this question.

0 votes

The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.

The reduce phase can start long before a reducer is called. As soon as "a" mapper finishes the job, the generated data undergoes some sorting and shuffling (which includes call to combiner and partitioner). The reducer "phase" kicks in the moment post mapper data processing is started. As these processing is done, you will see progress in reducers percentage. However, none of the reducers have been called in yet.

You can customize when the reducers startup by changing the default value of mapred.reduce.slowstart.completed.maps in mapred-site.xml. A value of 1.00 will wait for all the mappers to finish before starting the reducers. A value of 0.0 will start the reducers right away. A value of 0.5 will start the reducers when half of the mappers are complete. You can also change mapred.reduce.slowstart.completed.maps on a job-by-job basis. In new versions of Hadoop (at least 2.4.1) the parameter is called is mapreduce.job.reduce.slowstart.completedmaps.

I hope this answer helps :)

answered Jul 27, 2018 by Frankie
• 9,830 points

Related Questions In Big Data Hadoop

0 votes
1 answer

When do Reduce tasks start in Hadoop?

As much I understand Reduce phase start ...READ MORE

answered Aug 9, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
508 views
0 votes
1 answer

When is the reduce tasks start in Hadoop?

Let me explain you the whole scenario. ...READ MORE

answered May 22, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
3,390 views
0 votes
1 answer

Not able to start Job History Server in Hadoop 2.8.1

You have to start JobHistoryServer process specifically ...READ MORE

answered Mar 30, 2018 in Big Data Hadoop by Ashish
• 2,650 points
2,336 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,617 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,215 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,928 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,293 views
0 votes
1 answer

How do I print hadoop properties in command line?

You can dump Hadoop config by running: $ ...READ MORE

answered Aug 23, 2018 in Big Data Hadoop by Frankie
• 9,830 points
2,250 views
0 votes
1 answer

What is the best functional language to do Hadoop Map-Reduce?

down voteacceptedBoth Clojure and Haskell are definitely ...READ MORE

answered Sep 4, 2018 in Big Data Hadoop by Frankie
• 9,830 points
669 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP