Minimizing Data Transfers in Spark

I was wondering if there's any way to minimize the number of Data transfers in Apache Spark?

Jun 19, 2018 in Apache Spark by shams
• 3,670 points • 2,868 views

1 answer to this question.

Minimizing data transfers and avoiding shuffling helps write spark programs that run in a fast and reliable manner. The various ways in which data transfers can be minimized when working with Apache Spark are:

Using Broadcast Variable- Broadcast variable enhances the efficiency of joins between small and large RDDs.
Using Accumulators – Accumulators help update the values of variables in parallel while executing.
The most common way is to avoid operations ByKey, repartition or any other operations which trigger shuffles.

answered Jun 19, 2018 by Data_Nerd
• 2,390 points

Related Questions In Apache Spark

0 votes

1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 2,503 views

+1 vote

1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

answered Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 3,579 views

0 votes

1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,790 points • 4,571 views

0 votes

1 answer

Error Loading data to mysql in Spark

You have to use sqoop to export data ...READ MORE

answered Jul 11, 2019 in Apache Spark by Jishan
• 2,228 views

0 votes

1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

answered Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,350 points • 2,135 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 7,739 views

0 votes

3 answers

How to connect Spark to a remote Hive server?

JDBC is not required here. Create a hive ...READ MORE

answered Mar 8, 2019 in Big Data Hadoop by Vijay Dixon
• 190 points • 14,967 views

0 votes

3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

answered Jan 1, 2019 in Apache Spark by anonymous
• 22,282 views

0 votes

1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

answered May 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 4,804 views

0 votes

1 answer

Akka in Spark

Spark uses Akka basically for scheduling. All ...READ MORE

answered May 31, 2018 in Apache Spark by Data_Nerd
• 2,390 points • 3,050 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP