Spark on Yarn

0 votes

I am trying to understand how spark runs on YARN cluster/client. I have the following queries.

  1. Is it necessary that spark is installed on all the nodes in the yarn cluster? I think it should because worker nodes in cluster execute a task and should be able to decode the code(spark APIs) in spark application sent to cluster by the driver?

  2. It says in the documentation "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client-side) configuration files for the Hadoop cluster". Why does the client node have to install Hadoop when it is sending the job to cluster?

Jul 18 in Apache Spark by nitinrawat895
• 10,710 points
36 views

1 answer to this question.

0 votes

If you just want to get your HDFS back to the normal state and don't worry much about the data, then

This will list the corrupt HDFS blocks:

hdfs fsck -list-corruptfileblocks

This will delete the corrupted HDFS blocks:

hdfs fsck / -delete

Note that, you might have to use

 sudo -u hdfs 

if you are not the sudo user (assuming "hdfs" is name of the sudo user)

answered Jul 18 by ravikiran
• 4,560 points

Related Questions In Apache Spark

0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,710 points
1,224 views
0 votes
0 answers

Why doesn't my Spark Yarn client runs on all available worker machines?

I am running an application on Spark ...READ MORE

Feb 22 in Apache Spark by Uzair Ahmad

edited Feb 22 by Omkar 767 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

answered Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,322 views
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
3,333 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,710 points
399 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,490 views
0 votes
1 answer

How do I turn off INFO Logging in Spark?

Execute this command in the spark directory: cp ...READ MORE

answered Jul 12 in Apache Spark by ravikiran
• 4,560 points
212 views
0 votes
1 answer

Spark Null Pointer Exception.

I used Spark 1.5.2 with Hadoop 2.6 ...READ MORE

answered Jul 19 in Apache Spark by ravikiran
• 4,560 points
336 views