what is the benefit of repartition 1 and coalesce 1 When we save data we use df repartition 1 so how many partition it will create

what is the benefit of repartition(1) and coalesce(1). When we save data we use df.repartition(1).so how many partition it will create

Jul 26, 2019 in Apache Spark by kumar Ma
• 5,719 views

In case you use .repartition(1) it will only create a single file per partition. the main benefit is, the less the number of file per partition, the higher the reading speed will be. However if the file size becomes more than or almost a GB, then better to go for 2nd partition like .repartition(2).

In case or repartition all data gets re shuffled. and all the files under a partition have almost same size.

by using coalesce you can just reduce the amount of Data being shuffled.

commented Apr 10, 2020 by Ankur

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):

Email me at this address if my answer is selected or commented on:

Privacy: Your email address will only be used for sending these notifications.

Related Questions In Apache Spark

0 votes

0 answers

When we create an RDD, does it bring the data and load it into the memory?

Can anyone suggest when we create an ...READ MORE

Jul 3, 2019 in Apache Spark by monalisa

recategorized Jul 4, 2019 by Gitika • 1,435 views

0 votes

1 answer

When we create an RDD, does it bring the data and load it into the memory?

Hi, No. An RDD is made up of ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,730 points • 1,237 views

0 votes

1 answer

How can we optimize and minimize the memory when work with scala use case?

Hi, There is a term in Scala that is ...READ MORE

answered Jul 5, 2019 in Apache Spark by Gitika
• 65,730 points • 1,107 views

0 votes

1 answer

How to use yield keyword in scala and why it is used instead of println?

Hi, The yield keyword is used because the ...READ MORE

answered Jul 6, 2019 in Apache Spark by Gitika
• 65,730 points • 2,165 views

+1 vote

0 answers

What is the use case of map and flatMap?

What is the major use case for ...READ MORE

Aug 25, 2019 in Apache Spark by anonymous
• 130 points
closed Aug 26, 2019 by Omkar • 1,899 views

0 votes

1 answer

How to use Scala anonymous functions and why do we use it?

Hi, Anonymous functions in Scala is the lightweight ...READ MORE

answered Jul 26, 2019 in Apache Spark by Gitika
• 65,730 points • 1,204 views

0 votes

1 answer

What is the use of App class in Scala?

Hi, Scala provides a helper class, called App, that ...READ MORE

answered Jul 31, 2019 in Apache Spark by Gitika
• 65,730 points • 11,844 views

0 votes

1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,730 points • 9,903 views

+1 vote

0 answers

How to combine a nested json file, which is being partitioned on the basis of source tags, and has varying internal structure, into a single json file; ( differently sourced Tag and varying structure)

Source tags are different: { x : [ { ...READ MORE

Oct 11, 2019 in Apache Spark by anonymous
• 160 points • 1,283 views

0 votes

3 answers

1)Given sfpd RDD, to create a pair RDD consisting of tuples of the form (Category. 1) in scala ,which of the following is used?

C would be an answer which shows ...READ MORE

answered Mar 30, 2023 in Apache Spark by anonymous

edited Mar 5 • 6,905 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP