I don't understand the reason behind Spark RDD being immutable.

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,580 points
1,878 views

3 answers to this question.

0 votes
Following are the reasons:
- Immutable data is always safe to share across multiple processes as well as multiple threads.
- Since RDD is immutable we can recreate the RDD any time. (From lineage graph).
- If the computation is time-consuming, in that we can cache the RDD which result in performance improvement.

I hope this helps you !!
answered Jul 26, 2018 by zombie
• 3,690 points
0 votes
  • Apache Spark on HDFS, MESOS or Local mode distributes and store transformation data in the form of RDD (Resilient Distributed DataSets).
  • RDDs are not just immutable but a deterministic function of their input. That means RDD can be recreated at any time.This helps in taking advantage of caching, sharing and replication. RDD isn't really a collection of data, but just a recipe for making data from other data.
  • Immutability rules out a big set of potential problems due to updates from multiple threads at once. Immutable data is definitely safe to share across processes.
  • Immutable data can as easily live in memory as on disk. This makes it reasonable to easily move operations that hit disk to instead use data in memory, and again, adding memory is much easier than adding I/O bandwidth.
  •  RDD significant design wins, at cost of having to copy data rather than mutate it in place. Generally, that's a decent tradeoff to make: gaining the fault tolerance and correctness with no developer effort worth spending disk memory and CPU on.
answered Aug 23, 2018 by samarth295
• 2,190 points
0 votes
There are few reasons for keeping RDD immutable as follows:

1- Immutable data can be shared easily.

2- It can be created at any point of time.

3- Immutable data can easily live on memory as on disk.

Hope the answer will helpful.
answered Apr 18 by santlal561987@gmail.com

Related Questions In Apache Spark

0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
562 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,210 points
223 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,210 points
876 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,210 points
1,474 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

answered Jun 5, 2018 in Apache Spark by Shubham
• 13,210 points
104 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 10,150 points
131 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
10,204 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,150 points
844 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
491 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
7,135 views