How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,580 points
70 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

Some of the key differences between an RDD and Distributed Storage are as follows:

  • Resilient Distributed Dataset (RDD) is the primary abstraction of data for Apache Sparkframework.
  • Distributed Storage is simply a file system which works on multiple nodes.
  • RDDs store data in-memory (unless explicitly cached).
  • Distributed Storage stores data in persistent storage.
  • RDDs can re-compute itself in the case of failure or data loss.
  • If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).
I hope this helps you !!
answered Jul 26, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 12,270 points
470 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,270 points
646 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
44 views
0 votes
1 answer

In a Spark DataFrame how can I flatten the struct?

You can go ahead and use the ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 12,270 points
340 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,270 points
1,000 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 9,070 points
780 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 9,070 points
94 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 9,279 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
244 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
5,015 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.