Delta Updates(somehow)?
Hi y'all!
I have a dataset of CSV files(some large enough to not fit into memory). I also have a schema set up to insert that data into Cloudflare D1. A fresh copy of this dataset is pulled from an external API once a day, though I do not know whether anything has changed, whether it is only a single file, or even if it is just a single row in a large file that has changed.
Is there some way for me to only push updates for rows/fields that actually need to be changed, minimising the number of operations I need to do against the DB?
2 Replies
Well it is possible to only push updates for rows/fields that need to be changed. But finding the rows/fields that has been updated is the hard part here. It's also outside the scope of Drizzle.
If the data has a unique key you can identify in the db, you could do a fetch per row to compare and optionally update. But that will be a lot of querys to the DB. You would also have to set up logic for loading and processing the CSV files in chunks, so you don't run out of memory.
Ok, that makes sense... I've also contemplated having a separate table that I insert into, then replace the old table with the new one, but I would assume that is also outside the scope of what Drizzle supports