How to tune Spark jobs & optimize the performance?

0 votes
Can anyone help me out in optimizing the Spark job which is deployed on YARN cluster?

I want to know the changes from the configuration level. What are the approaches that can be taken to optimize the Spark streaming & Spark SQL jobs?
Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,350 points
564 views

1 answer to this question.

0 votes
You need to know the cluster properly on which you are deploying the jobs. I can give you some important approaches which will help you in optimizing your job.

First understand the default block size which is configured in the cluster and also try to understand the size of file that will be stored in the cluster. This will help you change your default block size.

Also check the maximum memory limit configured for your executor. Check the VCores that are allocated to your cluster.

The rate of data  all needs to be checked and optimized for streaming jobs (in your case Spark streaming).

The Garbage collector should also be optimized.

I would also say that code level optimization are very necessary and should always be considered.

The most important part is improving your cluster performance by experience. And for this you first have to estimate the records that you’ll be processing & the requirements of your application. You have to tweak your configurations multiple times and check the throughput that you are getting.
answered Apr 18, 2018 by coldcode
• 2,040 points

Related Questions In Big Data Hadoop

0 votes
11 answers
0 votes
1 answer

How to set the number of Map & Reduce tasks?

The map tasks created for a job ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by Shubham
• 13,350 points
103 views
+1 vote
2 answers

How to authenticate username & password while using Connector for Cloudera Hadoop in Tableau?

Hadoop server installed was kerberos enabled server. ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Priyaj
• 57,300 points
244 views
0 votes
1 answer

Apache Hadoop Yarn example program

You can go to this location $Yarn_Home/share/hadoop/mapreduce . You'll ...READ MORE

answered Apr 4, 2018 in Big Data Hadoop by nitinrawat895
• 10,800 points
251 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
249 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,800 points
3,570 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
18,381 views
0 votes
1 answer
0 votes
1 answer

How to get started with Hadoop?

Well, hadoop is actually a framework that ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by coldcode
• 2,040 points
115 views