I don't understand the reason behind Spark RDD being immutable.

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,580 points
1,219 views

3 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes
Following are the reasons:
- Immutable data is always safe to share across multiple processes as well as multiple threads.
- Since RDD is immutable we can recreate the RDD any time. (From lineage graph).
- If the computation is time-consuming, in that we can cache the RDD which result in performance improvement.

I hope this helps you !!
answered Jul 26, 2018 by zombie
• 3,690 points
0 votes
  • Apache Spark on HDFS, MESOS or Local mode distributes and store transformation data in the form of RDD (Resilient Distributed DataSets).
  • RDDs are not just immutable but a deterministic function of their input. That means RDD can be recreated at any time.This helps in taking advantage of caching, sharing and replication. RDD isn't really a collection of data, but just a recipe for making data from other data.
  • Immutability rules out a big set of potential problems due to updates from multiple threads at once. Immutable data is definitely safe to share across processes.
  • Immutable data can as easily live in memory as on disk. This makes it reasonable to easily move operations that hit disk to instead use data in memory, and again, adding memory is much easier than adding I/O bandwidth.
  •  RDD significant design wins, at cost of having to copy data rather than mutate it in place. Generally, that's a decent tradeoff to make: gaining the fault tolerance and correctness with no developer effort worth spending disk memory and CPU on.
answered Aug 23, 2018 by samarth295
• 2,190 points
0 votes
There are few reasons for keeping RDD immutable as follows:

1- Immutable data can be shared easily.

2- It can be created at any point of time.

3- Immutable data can easily live on memory as on disk.

Hope the answer will helpful.
answered Apr 18 by santlal561987@gmail.com

Related Questions In Apache Spark

0 votes
1 answer

In a Spark DataFrame how can I flatten the struct?

You can go ahead and use the ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 12,030 points
317 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 12,030 points
136 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,030 points
626 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,030 points
952 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

answered Jun 5, 2018 in Apache Spark by Shubham
• 12,030 points
87 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 9,030 points
88 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
7,039 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

answered Apr 19, 2018 in Apache Spark by nitinrawat895
• 9,030 points
465 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
216 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
4,769 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.