Cannot load file to spark org apache spark sql AnalysisException Path does not exist

Question

I am trying to upload a file from hdfs to Spark, but it is not working. Please help.

scala> val dataRDD = spark.read.textFile("file:///user/edureka_565414/Module5/AppleStore.csv").rdd
org.apache.spark.sql.AnalysisException: Path does not exist: file:/user/edureka_565414/Module5/AppleStore.csv;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.text(DataFrameReader.scala:506)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:542)
at org.apache.spark.sql.DataFrameReader.textFile(DataFrameReader.scala:515)
... 48 elided

score 0 · Answer 1 · Jul 31, 2019

Since the file is in HDFS so you have to give the hdfs link instead of using file while mentioning the path of the dataset. Use the hdfs path, it should work:

scala> val dataRDD = spark.read.textFile("hdfs:///user/edureka_565414/Module5/AppleStore.csv").rdd

answered Jul 31, 2019 by Tina

Cannot load file to spark org apache spark sql AnalysisException Path does not exist

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Apache Spark

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Error : split value is not a member of org.apache.spark.sql.Row

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

Hadoop dfs -ls command?

How can I write a text file in HDFS not from an RDD, in Spark program?

How to print the contents of RDD in Apache Spark?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES