Spark java io FileNotFoundException

+1 vote

While executing a query I am getting the below error:

val ddd=spark.sql("select Year, sum(countTotal) as total_count from df3 group by Year order by total_count desc limit 10 ")
ddd.show()

df3: org.apache.spark.sql.DataFrame = [holiday: int, workingday: int ... 13 more fields]
+-------+----------+-----+------+--------+---------+------+----------+----------+---+-----+----+---------+------------+-------------+
|holiday|workingday| temp| atemp|humidity|windspeed|casual|registered|countTotal|Day|Month|Year|EventTime|season_|weather_|
+-------+----------+-----+------+--------+---------+------+----------+----------+---+-----+----+---------+------------+-------------+
| 0| 0| 9.84|14.395| 81| 0.0| 3| 13| 16| 1| 1|2011| 0:0:0| 1| 1|
| 0| 0| 9.02|13.635| 80| 0.0| 8| 32| 40| 1| 1|2011| 1:0:0| 1| 1|
| 0| 0| 9.02|13.635| 80| 0.0| 5| 27| 32| 1| 1|2011| 2:0:0| 1| 1|
| 0| 0| 9.84|14.395| 75| 0.0| 3| 10| 13| 1| 1|2011| 3:0:0| 1| 1|
| 0| 0| 9.84|14.395| 75| 0.0| 0| 1| 1| 1| 1|2011| 4:0:0| 1| 1|
| 0| 0| 9.84| 12.88| 75| 6.0032| 0| 1| 1| 1| 1|2011| 5:0:0| 1| 2|
| 0| 0| 9.02|13.635| 80| 0.0| 2| 0| 2| 1| 1|2011| 6:0:0| 1| 1|
| 0| 0| 8.2| 12.88| 86| 0.0| 1| 2| 3| 1| 1|2011| 7:0:0| 1| 1|
| 0| 0| 9.84|14.395| 75| 0.0| 1| 7| 8| 1| 1|2011| 8:0:0| 1| 1|
| 0| 0|13.12|17.425| 76| 0.0| 8| 6| 14| 1| 1|2011| 9:0:0| 1| 1|
| 0| 0|15.58|19.695| 76| 16.9979| 12| 24| 36| 1| 1|2011| 10:0:0| 1| 1|
| 0| 0|14.76|16.665| 81| 19.0012| 26| 30| 56| 1| 1|2011| 11:0:0| 1| 1|
| 0| 0|17.22| 21.21| 77| 19.0012| 29| 55| 84| 1| 1|2011| 12:0:0| 1| 1|
| 0| 0|18.86|22.725| 72| 19.9995| 47| 47| 94| 1| 1|2011| 13:0:0| 1| 2|
| 0| 0|18.86|22.725| 72| 19.0012| 35| 71| 106| 1| 1|2011| 14:0:0| 1| 2|
| 0| 0|18.04| 21.97| 77| 19.9995| 40| 70| 110| 1| 1|2011| 15:0:0| 1| 2|
| 0| 0|17.22| 21.21| 82| 19.9995| 41| 52| 93| 1| 1|2011| 16:0:0| 1| 2|
| 0| 0|18.04| 21.97| 82| 19.0012| 15| 52| 67| 1| 1|2011| 17:0:0| 1| 2|
| 0| 0|17.22| 21.21| 88| 16.9979| 9| 26| 35| 1| 1|2011| 18:0:0| 1| 3|
| 0| 0|17.22| 21.21| 88| 16.9979| 6| 31| 37| 1| 1|2011| 19:0:0| 1| 3|
+-------+----------+-----+------+--------+---------+------+----------+----------+---+-----+----+---------+------------+-------------+

ddd: org.apache.spark.sql.DataFrame = [Year: int, sum(countTotal): bigint]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 178.0 failed 1 times, most recent failure: Lost task 0.0 in stage 178.0 (TID 1220, localhost, executor driver): java.io.FileNotFoundException: /tmp/blockmgr-a6766964-7801-4d25-bb63-cdcd5bc5fd6d/03/temp_shuffle_d5df2b2a-c0b3-4414-bc7d-ff85578f5cb0 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:102)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:115)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:229)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
Jul 16, 2019 in Apache Spark by Tilka
3,614 views
Same on me

Pyspark 2.4.0 when I train GMM

1 answer to this question.

+2 votes
Hello,

From the error I get that the shuffle file isn't there anymore.

Try to increase the executor memory, or check if you have enough space on your storage.
answered Dec 13, 2019 by Alexandru
• 510 points

Related Questions In Apache Spark

0 votes
1 answer

Get Spark SQL configuration in Java

You will need to use Spark session ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
776 views
0 votes
1 answer
0 votes
1 answer

Spark error: Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable.

Give  read-write permissions to  C:\tmp\hive folder Cd to winutils bin folder ...READ MORE

answered Jul 11, 2019 in Apache Spark by Rajiv
7,018 views
0 votes
1 answer

Spark: java.sql.SQLException: No suitable driver

The missing driver is the JDBC one ...READ MORE

answered Jul 24, 2019 in Apache Spark by John
16,593 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,602 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,207 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,777 views
+1 vote
1 answer

Cannot resolve Error In Spark when filter records with two where condition

Try df.where($"cola".isNotNull && $"cola" =!= "" && !$"colb".isin(2,3)) your ...READ MORE

answered Dec 13, 2019 in Apache Spark by Alexandru
• 510 points

edited Dec 13, 2019 by Alexandru 2,297 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP