what is the benefit of repartition(1) and coalesce(1). When we save data we use df.repartition(1).so how many partition it will create

–1 vote
what is the benefit of repartition(1) and coalesce(1). When we save data we use df.repartition(1).so how many partition it will create
Jul 26, 2019 in Apache Spark by kumar Ma
1,190 views
In case you use .repartition(1) it will only create a single file per partition. the main benefit is, the less the number of file per partition, the higher the reading speed will be. However if the file size becomes more than or almost a GB, then better to go for 2nd partition like .repartition(2).

In case or repartition all data gets re shuffled. and all the files under a partition have almost same size.

by using coalesce you can just reduce the amount of Data being shuffled.

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes
1 answer

When we create an RDD, does it bring the data and load it into the memory?

Hey, No, an RDD is made up of ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 41,360 points
186 views
0 votes
1 answer

When we create an RDD, does it bring the data and load it into the memory?

Hi, No. An RDD is made up of ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 41,360 points
109 views
0 votes
0 answers

How can we optimize and minimize the memory when work with scala use case?

When we calculate some use case with ...READ MORE

Jul 5, 2019 in Apache Spark by nilam
113 views
0 votes
1 answer

How can we optimize and minimize the memory when work with scala use case?

Hi, There is a term in Scala that is ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 41,360 points
116 views
0 votes
1 answer

How to use yield keyword in scala and why it is used instead of println?

Hi, The yield keyword is used because the ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 41,360 points
279 views
+1 vote
0 answers

What is the use case of map and flatMap?

What is the major use case for ...READ MORE

Aug 24, 2019 in Apache Spark by anonymous
• 130 points

closed Aug 26, 2019 by Omkar 368 views
0 votes
1 answer

How to use Scala anonymous functions and why do we use it?

Hi, Anonymous functions in Scala is the lightweight ...READ MORE

answered Jul 26, 2019 in Apache Spark by Gitika
• 41,360 points
145 views
0 votes
1 answer

What is the use of App class in Scala?

Hi, Scala provides a helper class, called App, that ...READ MORE

answered Jul 31, 2019 in Apache Spark by Gitika
• 41,360 points
2,299 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 41,360 points
2,858 views