How is RDD in Spark different from Distributed Storage Management Can anyone help me with this

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,670 points
1,561 views

1 answer to this question.

0 votes

Some of the key differences between an RDD and Distributed Storage are as follows:

  • A Resilient Distributed Dataset (RDD) is the primary abstraction of data for the Apache Spark framework.
  • Distributed Storage is simply a file system which works on multiple nodes.
  • RDDs store data in-memory (unless explicitly cached).
  • Distributed Storage stores data in persistent storage.
  • RDDs can re-compute itself in the case of failure or data loss.
  • If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).

I hope this helps you !!

answered Jul 26, 2018 by zombie
• 3,790 points

Related Questions In Apache Spark

+1 vote
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,730 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,488 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
1,162 views
0 votes
1 answer

How is Val different from var in Scala?

Hey, In this language, val is a value and var is ...READ MORE

answered Jul 24, 2019 in Apache Spark by Gitika
• 65,770 points
786 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,770 points
9,545 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
6,129 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,572 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,970 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,790 points
20,694 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
61,860 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP