Can SparkSession.catalog.clearCache() delete data from hdfs?

557 Views Asked by Pratyasha Sharma At 18 August 2020 at 21:37

I am experiencing some data deletion issue since we have migrated from CDH to HDP (spark 2.2 to 2.3). The tables are being read from an hdfs location and after a certain time running spark job that reads and processes those tables, it throws table not found exception and when we check that location all the records are vanished. In my spark(Java) code I see before that table is read, clearCache() is called. Can it delete those files? If yes, how do I fix it?

Original Q&A

There are 1 best solutions below

Som On 19 August 2020 at 09:57

I think, you should look at the source code - Spark has their own implementation of caching user data and they never delete the same while managing this cache via CacheManager. Have alook

Can SparkSession.catalog.clearCache() delete data from hdfs?

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in HADOOP

Related Questions in HORTONWORKS-DATA-PLATFORM

Related Questions in APACHE-SPARK-2.3

Trending Questions

Popular # Hahtags

Popular Questions