Important Apache Spark Interview Questions Bank

+2 votes

With questions and answers around Spark CoreSpark Streaming,Spark SQLGraphXMLlib among others, it is difficult to make your gateway to your next Spark job. To get a brief idea of the most frequently asked questions, refer this link here:- http://bit.ly/2BxbJCi
If anyone was asked a question not covered in this blog, please share the questions below. I'll get it added to the blog so that interviewees can use it in the future. 

Aug 22, 2018 in Career Counselling by Priyaj
• 58,020 points
3,262 views

3 answers to this question.

+1 vote

What is RDD?

RDDs (Resilient Distributed Datasets) are basic abstraction in Apache Spark that represent the data coming into the system in object format. RDDs are used for in-memory computations on large clusters, in a fault tolerant manner. RDDs are read-only portioned, collection of records, that are –

  • Immutable – RDDs cannot be altered.
  • Resilient – If a node holding the partition fails the other node takes the data.
answered Aug 22, 2018 by findingbugs
• 4,780 points
Thank you findingbugs
+1 vote
Hello
I wanted to know as what are the different cluster managers in Apache Spark
answered Aug 22, 2018 by eatcodesleeprepeat
• 4,710 points
well to describe it an easy way we can go like,
The 3 different clusters managers supported in Apache Spark are:

    YARN
    Apache Mesos -Has rich resource scheduling capabilities and is well suited to run Spark along with other applications. It is advantageous when several users run interactive shells because it scales down the CPU allocation between commands.
    Standalone deployments – Well suited for new deployments which only run and are easy to set up.
+1 vote

Hi Priyaj
I have this one question

What is lineage graph?

answered Aug 22, 2018 by bug_seeker
• 15,520 points
Hello bug_seeker
The RDDs in Spark, depend on one or more other RDDs. The representation of dependencies in between RDDs is known as the lineage graph. Lineage graph information is used to compute each RDD on demand, so that whenever a part of persistent RDD is lost, the data that is lost can be recovered using the lineage graph information.

Related Questions In Career Counselling

+11 votes
6 answers
+11 votes
7 answers

“IMPORTANT” interview questions for DevOps

Hello everyone here is an updated blog ...READ MORE

answered Jan 17, 2019 in Career Counselling by Edureka
• 4,220 points
5,370 views
+3 votes
4 answers
+4 votes
9 answers

***IMPORTANT*** AngularJS Interview Questions.

Yes, I agree with Omkar AngularJs is ...READ MORE

answered Mar 17, 2019 in Career Counselling by Sharad
• 180 points
5,940 views
0 votes
1 answer

What do we exactly mean by “Hadoop” – the definition of Hadoop?

The official definition of Apache Hadoop given ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by Shubham
2,793 views
+1 vote
1 answer
0 votes
3 answers

Can we run Spark without using Hadoop?

No, you can run spark without hadoop. ...READ MORE

answered May 7, 2019 in Big Data Hadoop by pradeep
3,418 views
0 votes
1 answer

Joining Multiple Spark Dataframes

You can run the below code to ...READ MORE

answered Mar 26, 2018 in Big Data Hadoop by Bharani
• 4,660 points
3,459 views
+16 votes
6 answers
+2 votes
4 answers
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP