How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,580 points
186 views

1 answer to this question.

0 votes

Some of the key differences between an RDD and Distributed Storage are as follows:

  • Resilient Distributed Dataset (RDD) is the primary abstraction of data for Apache Sparkframework.
  • Distributed Storage is simply a file system which works on multiple nodes.
  • RDDs store data in-memory (unless explicitly cached).
  • Distributed Storage stores data in persistent storage.
  • RDDs can re-compute itself in the case of failure or data loss.
  • If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).
I hope this helps you !!
answered Jul 26, 2018 by zombie
• 3,690 points

Related Questions In Apache Spark

0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,310 points
636 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,310 points
1,567 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
99 views
0 votes
1 answer

How is Val different from var in Scala?

Hey, In this language, val is a value and var is ...READ MORE

answered Jul 24 in Apache Spark by Gitika
• 25,360 points
34 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2 in Apache Spark by Gitika
• 25,360 points
632 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
1,206 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,310 points
2,520 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,574 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
1,980 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Simple and easy: line.foreach(println) READ MORE

answered Dec 10, 2018 in Apache Spark by Kuber
12,645 views