How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,580 points
163 views

1 answer to this question.

0 votes

Some of the key differences between an RDD and Distributed Storage are as follows:

  • Resilient Distributed Dataset (RDD) is the primary abstraction of data for Apache Sparkframework.
  • Distributed Storage is simply a file system which works on multiple nodes.
  • RDDs store data in-memory (unless explicitly cached).
  • Distributed Storage stores data in persistent storage.
  • RDDs can re-compute itself in the case of failure or data loss.
  • If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).
I hope this helps you !!
answered Jul 26, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,300 points
607 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,300 points
1,394 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
92 views
0 votes
1 answer

How is Val different from var in Scala?

Hey, In this language, val is a value and var is ...READ MORE

answered Jul 24 in Apache Spark by Gitika
• 25,340 points
28 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2 in Apache Spark by Gitika
• 25,340 points
569 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
1,052 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,300 points
2,305 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,690 points
1,469 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
1,503 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
11,247 views