Spark on Yarn

0 votes

I am trying to understand how spark runs on YARN cluster/client. I have the following queries.

  1. Is it necessary that spark is installed on all the nodes in the yarn cluster? I think it should because worker nodes in cluster execute a task and should be able to decode the code(spark APIs) in spark application sent to cluster by the driver?

  2. It says in the documentation "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client-side) configuration files for the Hadoop cluster". Why does the client node have to install Hadoop when it is sending the job to cluster?

Jul 18, 2019 in Apache Spark by nitinrawat895
• 11,380 points
883 views

1 answer to this question.

0 votes

If you just want to get your HDFS back to the normal state and don't worry much about the data, then

This will list the corrupt HDFS blocks:

hdfs fsck -list-corruptfileblocks

This will delete the corrupted HDFS blocks:

hdfs fsck / -delete

Note that, you might have to use

 sudo -u hdfs 

if you are not the sudo user (assuming "hdfs" is name of the sudo user)

answered Jul 18, 2019 by ravikiran
• 4,620 points

Related Questions In Apache Spark

0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
6,282 views
0 votes
0 answers

Why doesn't my Spark Yarn client runs on all available worker machines?

I am running an application on Spark ...READ MORE

Feb 22, 2019 in Apache Spark by Uzair Ahmad

edited Feb 22, 2019 by Omkar 8,247 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,350 points
5,575 views
0 votes
1 answer
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,057 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,559 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,984 views
+1 vote
1 answer

How do I turn off INFO Logging in Spark?

Hi, You need to edit one property in ...READ MORE

answered Jul 12, 2019 in Apache Spark by ravikiran
• 4,620 points

edited Dec 20, 2020 by MD 6,276 views
0 votes
1 answer

Spark Null Pointer Exception.

I used Spark 1.5.2 with Hadoop 2.6 ...READ MORE

answered Jul 19, 2019 in Apache Spark by ravikiran
• 4,620 points
6,377 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP