Can anyone explain what is RDD in Spark?

0 votes

I am new to Apache Spark and when I was going through RDD, it said RDD is an immutable distributed collection of objects. I am confused about this statement. Next, I read that RDD can be created by 2 ways: 1st is by loading from external data source & 2nd is by distributing object collection from the driver program.

Can anyone explain to me what is RDD? 

May 24, 2018 in Apache Spark by coder_jazz
638 views

1 answer to this question.

0 votes

RDD is a fundamental data structure of Spark. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes.

There are two ways to create RDDs − parallelizing an existing collection in your driver program, or referencing a dataset in an external storage system, such as a shared file system, HDFS, HBase, or any data source offering a Hadoop Input Format.

answered May 24, 2018 by Shubham
• 13,310 points

Related Questions In Apache Spark

0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,690 points
187 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

answered Aug 22, 2018 in Apache Spark by samarth295
• 2,190 points
3,816 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 27, 2018 in Apache Spark by shams
• 3,580 points
17,236 views
0 votes
1 answer

What is RDD in Apache spark?

Hi, RDD in spark stands for REsilient distributed ...READ MORE

answered Jul 1 in Apache Spark by Gitika
• 25,360 points
94 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,310 points
462 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,310 points
2,561 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

answered Jun 5, 2018 in Apache Spark by Shubham
• 13,310 points
156 views
0 votes
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

answered Jul 9, 2018 in Apache Spark by zombie
• 3,690 points
2,051 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,310 points
1,591 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

answered Jul 4 in Apache Spark by Dhara dhruve
1,226 views