What is Hadoop Performance Tuning?

0 votes
I increased the input split size from 128MB to 256MB. The execution time of the job has been decreased by a minute.

But I could not understand the behavior. Why it is happening? In what scenarios, we can tune the input split size?
Oct 23, 2018 in Big Data Hadoop by Neha
• 6,280 points
85 views

1 answer to this question.

0 votes
Is it consistent or one-off reading? Is this on your local hadoop installation or on a cluster?

I would suggest to record number of mappers when input split size is 128MB and 256MB for number of runs. That may have a possible hint as to why the execution time is decreased by a minute.

The number of input splits corresponds to the number of mappers needed to process the input. If this number is higher than the map slots available on your cluster, job has to wait until one set of mappers are run before it can process remaining ones. However, if a number of input splits are less ( e.g 256MB in your case) then accordingly number of map tasks to be run are lesser than earlier case. If this number is lesser than or equal to the number of map slots on your cluster then there are chances that all of your map tasks running simultaneously which may better your job execution time.
answered Oct 23, 2018 by Neha
• 6,280 points

Related Questions In Big Data Hadoop

0 votes
10 answers

What is the difference between Mongodb and Hadoop?

Apart from the similarity that they are ...READ MORE

answered Dec 6, 2018 in Big Data Hadoop by Deeraj
2,540 views
0 votes
13 answers

What is the difference between Hadoop/HDFS & HBase?

HDFS is a distributed file system whereas ...READ MORE

answered Apr 26 in Big Data Hadoop by Arihar
• 160 points
9,675 views
0 votes
1 answer

What is hadoop.tmp.dir ?

hadoop.tmp.dir is used as the base for temporary ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by nitinrawat895
• 10,690 points
1,649 views
0 votes
1 answer

What is Hadoop Distribution ?

Some companies release or sell products that ...READ MORE

answered Mar 27, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
34 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,027 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,954 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,112 views
0 votes
1 answer
0 votes
1 answer

What is the difference between Hadoop API and Streaming?

Usually we have Map/Reduce pair written in ...READ MORE

answered Dec 12, 2018 in Big Data Hadoop by Neha
• 6,280 points
46 views
0 votes
1 answer

What are the different ways of Installing Hadoop into our local machine?

Hadoop runs on Unix and on Windows. ...READ MORE

answered Aug 3, 2018 in Big Data Hadoop by Neha
• 6,280 points
256 views