We collect raw data from various data delivery streams in S3, in Delta format. We choose Delta mainly because we want an easy way to compact the many small objects into bigger S3 objects, that can later be processed more (cost-)efficiently. We want to keep this data only for 60 days, and then expire it. DELETE-ing the data from the tables will not physically remove it, but only update the transaction logs to point to the latest data sources. Only a VACUUM will physically change objects in S3. However, touching the S3 objects costs additional money, which we want to avoid. Our idea was to use S3 life cycle policies to expire the data after 60 days, which costs almost nothing. But it seems that this is not recommended, because it can corrupt the Delta transaction logs, and render the table unusable. Is there a method allowing to safely remove objects from S3 after a given time period, without corrupting the Delta meta data?
expire S3 objects after deletion from Delta Lake without breaking meta data
183 Views Asked by Alex At
0
There are 0 best solutions below
Related Questions in AMAZON-WEB-SERVICES
- S3 integration testing
- How to get content of BLOCK types LAYOUT_TITLE, LAYOUT_SECTION_HEADER and LAYOUT_xx in Textract
- Error **net::ERR_CONNECTION_RESET** error while uploading files to AWS S3 using multipart upload and Pre-Signed URL
- Failed to connect to your instance after deploying mern app on aws ec2 instance when i try to access frontend
- AWS - Tab Schema Conversion don't show up after creating a Migration Project
- Unable to run Bash Script using AWS Custom Lambda Runtime
- Using Amazon managed Prometheus to get EC2 metrics data in Grafana
- AWS Dns record A not navigate to elb
- Connection timed out error with smtp.gmail.com
- AWS Cognito Multi-tenant Integration | Ok to use Client’s Idp?
- Elasticbeanstalk FastAPI application is intermittently not responding to https requests
- Call an External API from AWS Lambda
- Why my mail service api spring isnt working?
- export 'AWSIoTProvider' (imported as 'AWSIoTProvider') was not found in '@aws-amplify/pubsub'
- How to take first x seconds of Audio from a wav file read from AWS S3 as binary stream using Python?
Related Questions in AMAZON-S3
- Mocking AmazonS3 listObjects function in scala
- S3 integration testing
- Error **net::ERR_CONNECTION_RESET** error while uploading files to AWS S3 using multipart upload and Pre-Signed URL
- Golang lambda upload image into s3 static website
- How to take first x seconds of Audio from a wav file read from AWS S3 as binary stream using Python?
- AWS Lambda Trigger For Same S3 File Name In Quick Succession
- Is there a way to upload a file in digital ocean object storage using php curl
- How to setup AWS credentials for next.js apps?
- S3 pre-signed url not working on whatsapp cloud Api
- How to set custom Origin Name in AWS CDK for CloudFront
- Property 'location' does not exist on type 'File'
- Resource handler returned message: "Unable to validate the following destination configurations
- Webmin CentOS7 AWS backup errors - perl(S3::AWSAuthConnection) can't be installed
- How to access variable to pass through url_for() as src in Flask App
- I cant figure out how to pull scripts from s3 to my aws workspace
Related Questions in DELTA-LAKE
- Existing column unrecognized by Delta merge
- Writing on Delta Table with Change Data Feed enabled
- Programatically querying Delta Table via Athena is failing
- Delta Lake as ingress for Flink Stateful Functions
- Optimise Command on Delta Table
- Azure SQL support for Delta tables
- Executing Spark sql in delta live tables
- New delta log folder is not getting created
- Adding column metadata comments in delta live table
- org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus can't be cast to org.apache.spark.sql.execution.datasources.FileStatusWithMetadat
- Databricks AutoLoader - how to handle spark write transactional (_SUCCESS file) on Azure Data Lake Storage?
- pyspark casting missing struct in optional array for delta table
- How to Refresh Unity Catalog Table MataStore
- How to drop or skip data type mismatch while reading from Mongo using Spark Mongo Connector
- Apache Delta upsert vs insert/delete
Related Questions in S3-LIFECYCLE-POLICY
- What is the behavior of lifecycle deletes if a lifecycle event notification fails to publish to SNS?
- S3 Lifecycle Policy - Prefix
- Preserving Folder Structure While Applying S3 Deletion Rule
- S3 lifecycle transition: Does LastModified date change after a transition?
- s3 bucket LifecycleConfiguration rule skipped some expired files
- How to permanently delete an empty folder from a S3 bucket where versioning is enabled?
- expire S3 objects after deletion from Delta Lake without breaking meta data
- How to purge deleted objects from s3 bucket (w versioning)
- Terraform timeout error when trying to create multiple lifecycle rules on an s3 bucket
- Can't find S3 LifecycleTagPredicate in .net sdk for tag based configuration
- Permanently delete all delete marked objects in versioned S3 bucket
- Can you set auto deletion for every bucket and every future bucket on MinIO
- Is there a way in terraform to have multiple lifecycle configuration blocks for a single AWS S3 bucket?
- Does a new lifecycle management rule for aws s3 also applies to objects which are already older than threshold days provided in rule
- How to keep non-current versions in a nested folder via Cloudformation template?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?