We are storing sales data from a 3rd party API in our database. There's one row per product per day in our table. Data on the API might change retrospectively if there was a refund for example. However, we have no way of knowing which records were changed.
Let's say we query the sales for the 2023-01-01 today.
The API returned a list of ten products that each sold a single unit.
Tomorrow we will query the API again for the same date but this time it only returned nine products (since one was returned by the buyer). If we just used simple update queries, we'd still end up with ten products in our database for that day.
To keep our data in sync with the API we regularly request historical data, delete all records for the requested data and then insert the rows we got from the API. This seems very inefficient so I wanted to ask if there's a better way of doing it.
One other way of doing it that I can think of would be to get a list of product ids from our database and compare it to the ids returned by the API.
The API is returning compressed JSON files which we are processing as streams. We could keep track of the product ids contained in that file in a seperate variable and then after the stream ends, delete all entries from the database that are not included in that list.