How to tune Spark jobs optimize the performance

Question

Can anyone help me out in optimizing the Spark job which is deployed on YARN cluster?

I want to know the changes from the configuration level. What are the approaches that can be taken to optimize the Spark streaming & Spark SQL jobs?

coldcode · Answer 1 · Apr 18, 2018

You need to know the cluster properly on which you are deploying the jobs. I can give you some important approaches which will help you in optimizing your job.

First understand the default block size which is configured in the cluster and also try to understand the size of file that will be stored in the cluster. This will help you change your default block size.

Also check the maximum memory limit configured for your executor. Check the VCores that are allocated to your cluster.

The rate of data all needs to be checked and optimized for streaming jobs (in your case Spark streaming).

The Garbage collector should also be optimized.

I would also say that code level optimization are very necessary and should always be considered.

The most important part is improving your cluster performance by experience. And for this you first have to estimate the records that you’ll be processing & the requirements of your application. You have to tweak your configurations multiple times and check the throughput that you are getting.