Databricks Autoloader : Getting FileNotFoundException while using cloudFiles

102 Views Asked by At

Following is the code i am using the ingest the data incrementally (weekly).

val ssdf = spark.readStream.schema(schema) 
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.load(sourceUrl)
.filter(criteriaFilter)

val transformedDf = ssdf.transform(.....)

val processData = transformedDf
.select(recordFields: _*) 
.writeStream
.option("checkpointLocation", outputUrl + "checkpoint/")
.format("parquet")
.outputMode("append")
.option("path", outputUrl + run_id + "/") 
.trigger(Trigger.Once())
.start()
processData.processAllAvailable()
processData.stop()

For each week, the data is written to a new folder and checkpoint to the same folder. This worked fine for 3 to 5 incremental run. But recently i got the following error :
ERROR: Query termination received for [id=2345245425], with exception: org.apache.spark.SparkException: Job aborted. Caused by: java.io.FileNotFoundException: Unable to find batch s3://outputPath/20230810063959/_spark_metadata/0 What is the reason for this issue ? Dont know why is it searching for the '0' in the current folder. Any idea?

0

There are 0 best solutions below