Spark ignoreMissingFiles config not working for Json

90 Views Asked by Saurabh Gupta At 27 September 2023 at 07:29

I'm reading some Json files from ADLS, and doing some transformation on top of that Dataframe. While doing this transformation, some other job is moving those Json files to different folder.

df = spark.read.format("json").schema(jsonSchema).load(adlspath_input)

Resulting in this error : FileNotFoundException: Operation failed: "The specified path does not exist.", 404, HEAD, https://edapadls.dfs.core.windows.net/raw/folder/filename.json?upn=false&action=getStatus&timeout=90. [DEFAULT_FILE_NOT_FOUND]

It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If disk cache is stale or the underlying files have been removed, you can invalidate disk cache manually by restarting the cluster

Even after setting up spark config :

spark.sql("SET spark.sql.files.ignoreMissingFiles=true")

Getting the same error.

Databricks Runtime Version 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12)

The above Spark Configuration was working fine in Databricks Runtime Version : 10.4 LTS (includes Apache Spark 3.2.1, Scala 2.12), I did not encounter any error in this runtime.

spark.sql("SET spark.sql.files.ignoreMissingFiles=true")

Not working.

Original Q&A

Spark ignoreMissingFiles config not working for Json

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in AZURE-DATABRICKS

Related Questions in FILENOTFOUNDEXCEPTION

Trending Questions

Popular # Hahtags

Popular Questions