When is the reduce tasks start in Hadoop?

0 votes
Can someone tell me that in Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typically used?
May 22, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
412 views

1 answer to this question.

0 votes

Let me explain you the whole scenario. The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.

Reducers start shuffling based on a threshold of percentage of mappers that have finished. You can change the parameter to get reducers to start sooner or later.

You can customize when the reducers startup by changing the default value of mapred.reduce.slowstart.completed.maps in mapred-site.xml. A value of 1.00 will wait for all the mappers to finish before starting the reducers. A value of 0.0 will start the reducers right away. A value of 0.5 will start the reducers when half of the mappers are complete. You can also change mapred.reduce.slowstart.completed.maps on a job-by-job basis. In new versions of Hadoop (at least 2.4.1) the parameter is called is mapreduce.job.reduce.slowstart.completedmaps.

Hope this will answer to your query to some extent.

answered May 22, 2018 by nitinrawat895
• 10,670 points

Related Questions In Big Data Hadoop

0 votes
1 answer

When do reduce tasks start in Hadoop?

The reduce phase has 3 steps: shuffle, ...READ MORE

answered Jul 26, 2018 in Big Data Hadoop by Frankie
• 9,810 points
33 views
0 votes
1 answer

When do Reduce tasks start in Hadoop?

As much I understand Reduce phase start ...READ MORE

answered Aug 9 in Big Data Hadoop by ravikiran
• 4,560 points
32 views
0 votes
11 answers
0 votes
1 answer

What is the use of sequence file in Hadoop?

Sequence files are binary files containing serialized ...READ MORE

answered Apr 5, 2018 in Big Data Hadoop by Ashish
• 2,630 points
1,394 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,240 points
1,092 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,965 views
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,020 points
94 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,700 views
0 votes
12 answers

What is Zookeeper? What is the purpose of Zookeeper in Hadoop Ecosystem?

Hey, Apache Zookeeper says that it is a ...READ MORE

answered Apr 29 in Big Data Hadoop by Gitika
• 25,340 points
4,896 views
0 votes
1 answer

Is there any way to increase Java Heap size in Hadoop?

You can add some more memory by ...READ MORE

answered Apr 12, 2018 in Big Data Hadoop by nitinrawat895
• 10,670 points
1,014 views