We have Streaming Application implemented using Spark Structured Streaming. which tries to read data from kafka topics and write it to HDFS Location.
Sometimes application fails giving error :
_spark_metadata/0 doesn't exist while compacting batch 9
java.lang.IllegalStateException: history/1523305060336/_spark_metadata/9.compact doesn't exist when compacting batch 19 (compactInterval: 10)
not able to resolve this issue.
only one solution i found and that is to delete checkpoint location files which will read topic/data from beginning if we run the application again, which is not feasible solution for production application.
can any one tell some solution for this error so i need not have to delete checkpoint and i can continue from where last run was failed.
Deleting check point location which will start application from beginning and read all previous data.
sample code of application:
spark.
readStream.
format("kafka")
.option("kafka.bootstrap.servers", <server list>)
.option("subscribe", <topic>)
.load()
spark.
writeStream.
format("csv").
option("format", "append").
option("path",hdfsPath).
option("checkpointlocation","")
.outputmode(append).start
need solution without deleting check pointing location.