what is the benefit of repartition 1 and coalesce 1 When we save data we use df repartition 1 so how many partition it will create

–1 vote
what is the benefit of repartition(1) and coalesce(1). When we save data we use df.repartition(1).so how many partition it will create
Jul 26, 2019 in Apache Spark by kumar Ma
5,297 views
In case you use .repartition(1) it will only create a single file per partition. the main benefit is, the less the number of file per partition, the higher the reading speed will be. However if the file size becomes more than or almost a GB, then better to go for 2nd partition like .repartition(2).

In case or repartition all data gets re shuffled. and all the files under a partition have almost same size.

by using coalesce you can just reduce the amount of Data being shuffled.

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
0 answers

When we create an RDD, does it bring the data and load it into the memory?

Can anyone suggest when we create an ...READ MORE

Jul 3, 2019 in Apache Spark by monalisa

recategorized Jul 4, 2019 by Gitika 1,239 views
0 votes
1 answer

When we create an RDD, does it bring the data and load it into the memory?

Hi, No. An RDD is made up of ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,890 points
767 views
0 votes
1 answer

How can we optimize and minimize the memory when work with scala use case?

Hi, There is a term in Scala that is ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,890 points
826 views
0 votes
1 answer

How to use yield keyword in scala and why it is used instead of println?

Hi, The yield keyword is used because the ...READ MORE

answered Jul 6, 2019 in Apache Spark by Gitika
• 65,890 points
1,869 views
+1 vote
0 answers

What is the use case of map and flatMap?

What is the major use case for ...READ MORE

Aug 25, 2019 in Apache Spark by anonymous
• 130 points

closed Aug 26, 2019 by Omkar 1,632 views
0 votes
1 answer

How to use Scala anonymous functions and why do we use it?

Hi, Anonymous functions in Scala is the lightweight ...READ MORE

answered Jul 26, 2019 in Apache Spark by Gitika
• 65,890 points
781 views
0 votes
1 answer

What is the use of App class in Scala?

Hi, Scala provides a helper class, called App, that ...READ MORE

answered Jul 31, 2019 in Apache Spark by Gitika
• 65,890 points
11,274 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,890 points
9,459 views
0 votes
1 answer
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP