I am experiencing some data deletion issue since we have migrated from CDH to HDP (spark 2.2 to 2.3). The tables are being read from an hdfs location and after a certain time running spark job that reads and processes those tables, it throws table not found exception and when we check that location all the records are vanished. In my spark(Java) code I see before that table is read, clearCache() is called. Can it delete those files? If yes, how do I fix it?
Can SparkSession.catalog.clearCache() delete data from hdfs?
557 Views Asked by Pratyasha Sharma At
1
There are 1 best solutions below
Related Questions in APACHE-SPARK
- Getting error while running spark-shell on my system; pyspark is running fine
- ingesting high volume small size files in azure databricks
- Spark load all partions at once
- Databricks Delta table / Compute job
- Autocomplete not working for apache spark in java vscode
- How to overwrite a single partition in Snowflake when using Spark connector
- Parse multiple record type fixedlength file with beanio gives oom and timeout error for 10GB data file
- includeExistingFiles: false does not work in Databricks Autoloader
- Spark connectors from Azure Databricks to Snowflake using AzureAD login
- SparkException: Task failed while writing rows, caused by Futures timed out
- Configuring Apache Spark's MemoryStream to simulate Kafka stream
- Databricks can't find a csv file inside a wheel I installed when running from a Databricks Notebook
- Add unique id to rows in batches in Pyspark dataframe
- Does Spark Dynamic Allocation depend on external shuffle service to work well?
- Does Spark structured streaming support chained flatMapGroupsWithState by different key?
Related Questions in HADOOP
- Can anyoone help me with this problem while trying to install hadoop on ubuntu?
- Hadoop No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster)
- Top-N using Python, MapReduce
- Spark Driver vs MapReduce Driver on YARN
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- can't write pyspark dataframe to parquet file on windows
- How to optimize writing to a large table in Hive/HDFS using Spark
- Can't replicate block xxx because the block file doesn't exist, or is not accessible
- HDFS too many bad blocks due to "Operation category WRITE is not supported in state standby" - Understanding why datanode can't find Active NameNode
- distcp throws java.io.IOException when copying files
- Hadoop MapReduce WordPairsCount produces inconsistent results
- If my data is not partitioned can that be why I’m getting maxResultSize error for my PySpark job?
- resource manager and nodemanager connectivity issues
- ERROR flume.SinkRunner: Unable to deliver event
- converting varchar(7) to decimal (7,5) in hive
Related Questions in HORTONWORKS-DATA-PLATFORM
- I want to ingest the csv data to the HDFS with Hortonworks Data Platform Sandbox
- Creating Data Lineage
- Is it possible to get data both hadoop cluster?
- Apache ranger hive plugin for HDP cluster is not working
- How can upload large files to the Horton work Sandbox HDP 2.6.5
- Hortonworks 2.6.5 yum install python-pip not working
- Partition the data frame using column X and writes the data without column X
- Why does ambari is showing this kerberos authentication error : AmbariAuthToLocalUserDetailsService
- How to get hortonworks data platform and cloudera distribution for hadoop latest version
- NiFi processors cannot connect to Zookeeper
- Postgresql stuck in recovery mode
- wget + download ambari tar ball
- How to copy a file from /user/maria_dev/tutorials/test.csv (HDP) to /sandbox/tutorial-files/640/nifi/input (HDF)?
- Do we need install all HDP's Services Client in all node?
- Hive - Convert epoch time (ms - 13 digit) to timestamp till milliseconds in hive sql
Related Questions in APACHE-SPARK-2.3
- How to convert array of array (string type) to struct - Spark/Scala?
- Spark 2.3 Stream-Stream Join lost left table key
- write pyspark dataframe to csv with out outer quotes
- pyspark dataframe column value replace with index in another list in pyspark version 2.3
- Find Longest Continuous Streak In Spark
- Can SparkSession.catalog.clearCache() delete data from hdfs?
- Should I enable shufflehashjoin when left data is large (~1B records) with power law and righ data is small (but > 2GB)
- Airflow: Use LivyBatchOperator for submitting pyspark applications in yarn
- SparkSubmitOperator vs SSHOperator for submitting pyspark applications in airflow
- How to transform two arrays of each column into a pair for a Spark DataFrame?
- Pyspark renaming file in HDFS
- Apache Spark not connecting to Hive meta store (Database not found)
- Writing CSV file using Spark and java - handling empty values and quotes
- Optimizing reading data to spark from Azure blob
- Quotes not displayed in CSV output file
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
I think, you should look at the source code - Spark has their own implementation of caching user data and they never delete the same while managing this cache via CacheManager. Have alook