Unable to submit the spark job in deployment mode - multinode cluster using ubuntu machines with yarn master

Question

Getting below exception while submitting the application using spark-submit.Please suggest me which configuration is missing

java.lang.Exception: You must specify a valid link name at org.apache.spark.deploy.yarn.ClientDistributedCacheManager.addResource(ClientDistributedCacheManager.scala:75) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$distribute$1(Client.scala:409) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6$anonfun$apply$3.apply(Client.scala:471) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6$anonfun$apply$3.apply(Client.scala:470) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6.apply(Client.scala:470) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6.apply(Client.scala:468) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:468) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:727) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1021) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1081) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

Hi@Ganendra,

Can you share your code or command? It will be easy to understand what is the issue. — MD, Jul 29, 2020
Thanks for your reply.

I am using spark-submit command as below:

spark-submit --deploy-mode cluster --master yarn --jars /dependency_jars_path --class class-name /classjar_path

Please suggest to me when this invalid link name error comes usually. Which configuration needs to be set to make it proper.

Thanks & Regards,

Gani — Ganendra, Jul 29, 2020
I think there is a problem with your path. You can check the below link. You will get the idea.

https://community.cloudera.com/t5/Support-Questions/Spark-job-fails-in-cluster-mode/td-p/58772 — MD, Jul 29, 2020

MD · Answer 1 · Jul 29, 2020

Hi@Ganendra,

As you said you launched a multinode cluster, you have to use spark-submit command. You cannot run yarn-cluster mode via spark-shell because when you will run spark application, the driver program will be running as part application master container/process. So it is not possible to run cluster mode via spark-shell.

$ spark-submit –class com.df.SparkWordCount SparkWC.jar yarn-client
$ spark-submit –class com.df.SparkWordCount SparkWC.jar yarn-cluster

answered Jul 29, 2020 by MD
• 95,460 points

Thanks for your reply.

I am using spark-submit only not spark-shell. Below is the similar command:

spark-submit --deploy-mode cluster --master yarn --jars /dependency_jars_path --class class-name /classjar_path