I am writing an online text editor. I want to allow users to add inline images and video to the document. I am struggling to implement this in a reliable way.
Current infrastructure:
- Database (postgres) of documents (text, title, author, list of media objects referencing
S3) - Object store (S3) where the images/video/files are stored
The current flow:
- User creates a new document
- User makes changes, but doesn't save it. These changes are stored in
localStorageso they are not lost on refresh. - The user attaches an image
- The image displays a loading indicator as it is uploaded to
S3(or equivalent) - The user saves the document, and the data is saved to a database. The objects are not saved, only
S3URLs to them.
Problem
- If the user deletes the document before saving, or if saving fails, there will be orphan files in
S3that are not referenced by any documents. - A "delete document" action must now delete something from Postgres and
S3. Since you cannot do a transaction across two completely different services, one can imagine a situation where the postgres delete succeeds, but theS3delete fails, creating more orphan objects.
Attempts at solutions
- I tried storing the media in
localStorageand committing them all when the document is saved. This would solve the issue, butlocalStorageis limited to 5-10mb, which is too small. - A reaper daemon that queries references to
S3in the database and cross-references it with objects stored inS3to find orphan objects, which it would automatically delete.
The reaper daemon would work, but it feels like a hack. I really don't want to manage an entirely new service just to store some files. Is there a better way to do this? What is the industry standard?
If it matters, I'm using React+Typescript and the text editor is built upon DraftJS.
Here's the solution to the core problem of keeping the database and the object store consistent.
First, a couple of general rules:
The database stores the following information about the objects:
The timestamps are optional but immutable once set.
The object can be used for as long as it has an upload timestamp and doesn't have a deletion timestamp.
It effectively goes through the following states:
The application needs to perform two operations here: creating an object and deleting an object. Both are idempotent.
Creation:
Deletion:
Two destructive updates happen here:
Finally, do lightweight sweeping periodically to clean up failed operations: