What is Hadoop Performance Tuning

0 votes
I increased the input split size from 128MB to 256MB. The execution time of the job has been decreased by a minute.

But I could not understand the behavior. Why it is happening? In what scenarios, we can tune the input split size?
Oct 24, 2018 in Big Data Hadoop by Neha
• 6,300 points
758 views

1 answer to this question.

0 votes
Is it consistent or one-off reading? Is this on your local hadoop installation or on a cluster?

I would suggest to record number of mappers when input split size is 128MB and 256MB for number of runs. That may have a possible hint as to why the execution time is decreased by a minute.

The number of input splits corresponds to the number of mappers needed to process the input. If this number is higher than the map slots available on your cluster, job has to wait until one set of mappers are run before it can process remaining ones. However, if a number of input splits are less ( e.g 256MB in your case) then accordingly number of map tasks to be run are lesser than earlier case. If this number is lesser than or equal to the number of map slots on your cluster then there are chances that all of your map tasks running simultaneously which may better your job execution time.
answered Oct 24, 2018 by Neha
• 6,300 points

Related Questions In Big Data Hadoop

0 votes
10 answers

What is the difference between Mongodb and Hadoop?

MongoDB is a NoSQL database, whereas Hadoop is ...READ MORE

answered Jun 20, 2018 in Big Data Hadoop by jenny_code
11,351 views
0 votes
13 answers

What is the difference between Hadoop/HDFS & HBase?

HDFS is a distributed file system whereas ...READ MORE

answered Apr 26, 2019 in Big Data Hadoop by Arihar
• 160 points
32,249 views
0 votes
1 answer

What is hadoop.tmp.dir ?

hadoop.tmp.dir is used as the base for temporary ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
6,226 views
0 votes
1 answer

What is Hadoop Distribution ?

Some companies release or sell products that ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
673 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,595 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,661 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,281 views
0 votes
1 answer
0 votes
1 answer

What is the difference between Hadoop API and Streaming?

Usually we have Map/Reduce pair written in ...READ MORE

answered Dec 12, 2018 in Big Data Hadoop by Neha
• 6,300 points
688 views
0 votes
1 answer

What are the different ways of Installing Hadoop into our local machine?

Hadoop runs on Unix and on Windows. ...READ MORE

answered Aug 4, 2018 in Big Data Hadoop by Neha
• 6,300 points
4,641 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP