How is RDD in Spark different from Distributed Storage Management Can anyone help me with this

0 votes
Jul 26, 2018 in Apache Spark by shams
• 3,670 points
1,493 views

1 answer to this question.

0 votes

Some of the key differences between an RDD and Distributed Storage are as follows:

  • A Resilient Distributed Dataset (RDD) is the primary abstraction of data for the Apache Spark framework.
  • Distributed Storage is simply a file system which works on multiple nodes.
  • RDDs store data in-memory (unless explicitly cached).
  • Distributed Storage stores data in persistent storage.
  • RDDs can re-compute itself in the case of failure or data loss.
  • If data is lost from the Distributed Storage system it is gone forever (unless there is an internal replication system).

I hope this helps you !!

answered Jul 26, 2018 by zombie
• 3,790 points

Related Questions In Apache Spark

+1 vote
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,664 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,413 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
1,136 views
0 votes
1 answer

How is Val different from var in Scala?

Hey, In this language, val is a value and var is ...READ MORE

answered Jul 24, 2019 in Apache Spark by Gitika
• 65,890 points
755 views
0 votes
1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,890 points
9,490 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Dhara dhruve
6,064 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,498 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

answered May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,913 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,790 points
20,575 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
61,707 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP