What is RDD in Apache spark

I am new to spark and got to know one term called RDD but could not understand what is it? Can anyone explain?

Jul 1, 2019 in Apache Spark by Lila

recategorized Jul 4, 2019 by Gitika • 2,150 views

1 answer to this question.

Hi,

RDD in spark stands for REsilient distributed dataset which is considered to be the backbone of Spark and is one of the fundamental data structure of Spark. It is also known as the schema-less structure which can handle both structured and unstructured data.

In spark, anything we do is around RDD, you are reading the data in spark then it is read into RDD again when we are transforming the data then we are performing transformations on old RDD and creating a new one. Then, at last, you will perform some action of RDD and store that data present in RDD to persistent storage.

answered Jul 1, 2019 by Gitika
• 65,730 points

Related Questions In Apache Spark

+1 vote

3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points • 45,564 views

+1 vote

1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

answered May 24, 2018 in Apache Spark by Shubham
• 13,490 points • 3,601 views

0 votes

1 answer

what is Paired RDD and how to create paired RDD in Spark?

Hi, Paired RDD is a distributed collection of ...READ MORE

answered Aug 2, 2019 in Apache Spark by Gitika
• 65,730 points • 10,538 views

0 votes

1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

answered May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 5,571 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 7,133 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 13,948 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 4,775 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 118,465 views

0 votes

1 answer

What is the difference between persist() and cache() in apache spark?

Using cash technique we can save intermediate ...READ MORE

answered Dec 27, 2022 in Apache Spark by Deepthi

edited Mar 5, 2025 • 5,116 views

0 votes

1 answer

What is RDD Lineage in Spark?

Hey, Lineage is an RDD process to reconstruct ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,730 points • 4,974 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP