backup database snapshot reliably

I'm using HNSWlib-node to create an in-memory vector DB. My code writes it to a file when new embeddings are added or deleted. Basically, does railway save to the disk? because it might save as json... I've tended to find these providers throw away the disk after the server restarts with a fresh disk so it has to be sent somewhere or the disk can’t be ephemeral I want to hedge risk by making sure the DB gets backed up on exit too. So, my question is how do I make sure that this file gets saved again when the server is shut down for code updates?
22 Replies
Percy
Percy2y ago
Project ID: 0314313d-b6e5-4895-98b0-f3402dfa9adc
spasianspice
spasianspiceOP2y ago
Project ID: 0314313d-b6e5-4895-98b0-f3402dfa9adc Basically, does railway save to the disk? because it might save as json... I've tended to find these providers throw away the disk after the server restarts with a fresh disk so it has to be sent somewhere or the disk can’t be ephemeral
Brody
Brody2y ago
you are right, containers on railway are empherial, so in a sense yes the data does get thrown away, you will want to be saving the file to bucket storage instead of to disk, ideally with file revisions
spasianspice
spasianspiceOP2y ago
Okay I'll look into that. Do you have recommendations? somewhere else that’s not ephemeral with low latency to your servers? @Brody
Brody
Brody2y ago
cloudflare r2 is my recommendation
spasianspice
spasianspiceOP2y ago
r2 is just storage — cant run a server there Are there any options you'd recommend for running a server? @Brody We're trying to set up an API server. Our product deployed on railway needs to make requests to it. So ideally we can get set up on a solution with: (a) low latency to Railway's servers and (b) is not ephemeral
Brody
Brody2y ago
yes r2 is storage, store the backup of your in memory database there
spasianspice
spasianspiceOP2y ago
We have multiple railway deploys
Brody
Brody2y ago
so?
spasianspice
spasianspiceOP2y ago
I'm just clarifying that our main product is deployed on Railway. And we have a second deploy that we want to just be a really low-latency API server that our main product can call - it doesn't have to be railway I say this because ideally we don't need to set up separate services for storage and server We'd prefer if they were all-in-one But the main priority is latency to the Railway server our main product is hosted on
Brody
Brody2y ago
railway doesn't offer storage, you'll have to use an external service like cloudflare r2 I'm sure cloudflare has a region very close to railway's us-west1 region
spasianspice
spasianspiceOP2y ago
Isn't railway hosted on GCP though? gcp has storage on servers @Brody
Brody
Brody2y ago
railway is hosted on gcp, but railway has not brought the option for persistent storage to the user yet
spasianspice
spasianspiceOP2y ago
Okay. So for absolute fastest latency for a persistent storage solution, your recommendation would still be cloudflare R2? Latency could kill our product - that's why I ask 🙏🏼
Brody
Brody2y ago
if latency to storage is that big of a concern, there will always be much higher latency to an external provider, so while railway is great, they don't have persistent storage volumes yet, but there are other PAAS providers that do offer storage volumes but you have only been talking database backups, and I don't see how the latency of a background task could effect the end users
spasianspice
spasianspiceOP2y ago
Good point @Brody - thanks for pushing back on this. I think we'll host on railway and then set up R2 storage. Do you have any resources you'd recommend for this? never know what tricks the railway team might have so I've gotta ask lol
Brody
Brody2y ago
cloudflare r2 is accessed through an aws S3 compatible api, so all you have to do is use Amazon's s3 sdk cloudflare has these resources https://developers.cloudflare.com/r2/examples/aws/ i don't know the language you will be using so I can't send specific links to the sdks since it's different for every language, but the aws S3 sdk docs are plenty easy to find but if your language is node, don't use the v3 sdk, I've heard it has terrible performance compared to the v2
spasianspice
spasianspiceOP2y ago
Yeah we're using node. Thanks for the heads up!
spasianspice
spasianspiceOP2y ago
Brody
Brody2y ago
also, it very much sounds like this will be a commercial product so you will want to upgrade to the team plan at some point
spasianspice
spasianspiceOP2y ago
for Railway? yeah the account this project is on is just my personal for testing. We're going to deploy it to our team account shortly 🙂
Brody
Brody2y ago
yes, but prepare yourself for the inevitable upgrade at some point in the future when v3 is considered stable v3 is considered stable by Amazon for some reason, but I've heard otherwise from developers
Want results from more Discord servers?
Add your server