How fault tolerance is achieved in Apache Spark

Hi,

I have a doubt that Hadoop uses replication factor to achieve fault tolerance so for Apache spark how this is achieved?

Jul 22, 2019 in Apache Spark by Kesha
• 9,262 views

1 answer to this question.

Hey,

In Apache Spark, the data storage model is based on RDD.

RDDs help to achieve fault tolerance through the lineage.
RDD always has information on how to build from other datasets.
If any partition of an RDD is lost due to failure, lineage helps build only that particular lost partition.

answered Jul 22, 2019 by Gitika
• 65,730 points

Related Questions In Apache Spark

0 votes

1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

answered May 7, 2018 in Apache Spark by BD Master
• 1,751 views

+1 vote

8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

answered Dec 10, 2018 in Apache Spark by Akshay
• 64,895 views

0 votes

1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

answered Jul 26, 2018 in Apache Spark by zombie
• 3,790 points • 2,208 views

+1 vote

3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

answered Aug 28, 2018 in Apache Spark by shams
• 3,670 points • 45,090 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 6,870 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 13,552 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 4,450 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 116,561 views

0 votes

1 answer

What is RDD in Apache spark?

Hi, RDD in spark stands for REsilient distributed ...READ MORE

answered Jul 1, 2019 in Apache Spark by Gitika
• 65,730 points • 1,973 views

0 votes

1 answer

What is the difference between persist() and cache() in apache spark?

Using cash technique we can save intermediate ...READ MORE

answered Dec 27, 2022 in Apache Spark by Deepthi

edited Mar 5 • 4,882 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP