Spark ignoreMissingFiles config not working for Json

90 Views Asked by At

I'm reading some Json files from ADLS, and doing some transformation on top of that Dataframe. While doing this transformation, some other job is moving those Json files to different folder.

df = spark.read.format("json").schema(jsonSchema).load(adlspath_input)

Resulting in this error : FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://edapadls.dfs.core.windows.net/raw/folder/filename.json?upn=false&action=getStatus&timeout=90. [DEFAULT_FILE_NOT_FOUND]

It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If disk cache is stale or the underlying files have been removed, you can invalidate disk cache manually by restarting the cluster

Even after setting up spark config :

spark.sql("SET spark.sql.files.ignoreMissingFiles=true")

Getting the same error.

Databricks Runtime Version 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)

The above Spark Configuration was working fine in Databricks Runtime Version : 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12), I did not encounter any error in this runtime.

spark.sql("SET spark.sql.files.ignoreMissingFiles=true")

Not working.

0

There are 0 best solutions below