Spark java io FileNotFoundException

Question

While executing a query I am getting the below error:

val ddd=spark.sql("select Year, sum(countTotal) as total_count from df3 group by Year order by total_count desc limit 10 ")
ddd.show()

df3: org.apache.spark.sql.DataFrame = [holiday: int, workingday: int ... 13 more fields]
+-------+----------+-----+------+--------+---------+------+----------+----------+---+-----+----+---------+------------+-------------+
|holiday|workingday| temp| atemp|humidity|windspeed|casual|registered|countTotal|Day|Month|Year|EventTime|season_|weather_|
+-------+----------+-----+------+--------+---------+------+----------+----------+---+-----+----+---------+------------+-------------+
| 0| 0| 9.84|14.395| 81| 0.0| 3| 13| 16| 1| 1|2011| 0:0:0| 1| 1|
| 0| 0| 9.02|13.635| 80| 0.0| 8| 32| 40| 1| 1|2011| 1:0:0| 1| 1|
| 0| 0| 9.02|13.635| 80| 0.0| 5| 27| 32| 1| 1|2011| 2:0:0| 1| 1|
| 0| 0| 9.84|14.395| 75| 0.0| 3| 10| 13| 1| 1|2011| 3:0:0| 1| 1|
| 0| 0| 9.84|14.395| 75| 0.0| 0| 1| 1| 1| 1|2011| 4:0:0| 1| 1|
| 0| 0| 9.84| 12.88| 75| 6.0032| 0| 1| 1| 1| 1|2011| 5:0:0| 1| 2|
| 0| 0| 9.02|13.635| 80| 0.0| 2| 0| 2| 1| 1|2011| 6:0:0| 1| 1|
| 0| 0| 8.2| 12.88| 86| 0.0| 1| 2| 3| 1| 1|2011| 7:0:0| 1| 1|
| 0| 0| 9.84|14.395| 75| 0.0| 1| 7| 8| 1| 1|2011| 8:0:0| 1| 1|
| 0| 0|13.12|17.425| 76| 0.0| 8| 6| 14| 1| 1|2011| 9:0:0| 1| 1|
| 0| 0|15.58|19.695| 76| 16.9979| 12| 24| 36| 1| 1|2011| 10:0:0| 1| 1|
| 0| 0|14.76|16.665| 81| 19.0012| 26| 30| 56| 1| 1|2011| 11:0:0| 1| 1|
| 0| 0|17.22| 21.21| 77| 19.0012| 29| 55| 84| 1| 1|2011| 12:0:0| 1| 1|
| 0| 0|18.86|22.725| 72| 19.9995| 47| 47| 94| 1| 1|2011| 13:0:0| 1| 2|
| 0| 0|18.86|22.725| 72| 19.0012| 35| 71| 106| 1| 1|2011| 14:0:0| 1| 2|
| 0| 0|18.04| 21.97| 77| 19.9995| 40| 70| 110| 1| 1|2011| 15:0:0| 1| 2|
| 0| 0|17.22| 21.21| 82| 19.9995| 41| 52| 93| 1| 1|2011| 16:0:0| 1| 2|
| 0| 0|18.04| 21.97| 82| 19.0012| 15| 52| 67| 1| 1|2011| 17:0:0| 1| 2|
| 0| 0|17.22| 21.21| 88| 16.9979| 9| 26| 35| 1| 1|2011| 18:0:0| 1| 3|
| 0| 0|17.22| 21.21| 88| 16.9979| 6| 31| 37| 1| 1|2011| 19:0:0| 1| 3|
+-------+----------+-----+------+--------+---------+------+----------+----------+---+-----+----+---------+------------+-------------+

ddd: org.apache.spark.sql.DataFrame = [Year: int, sum(countTotal): bigint]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 178.0 failed 1 times, most recent failure: Lost task 0.0 in stage 178.0 (TID 1220, localhost, executor driver): java.io.FileNotFoundException: /tmp/blockmgr-a6766964-7801-4d25-bb63-cdcd5bc5fd6d/03/temp_shuffle_d5df2b2a-c0b3-4414-bc7d-ff85578f5cb0 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:102)
at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:115)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:229)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:152)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)

Alexandru · Answer 1 · Dec 13, 2019

Hello,

From the error I get that the shuffle file isn't there anymore.

Try to increase the executor memory, or check if you have enough space on your storage.