Node graceful shutdowns
I have a node task manager process deployed to Railway that I'm trying to get to gracefully shutdown (finish current tasks before killing the service). Is this possible on railway? I've added logic to catch the SIGTERM and begin shutdown and this works in my local environment when I give it a SIGTERM, but in Railway it exits immediately with the following:
It doesn't even seem to hit my SIGTERM handler, defined below. Can't find any docs about this kind of thing, is this kind of delayed shutdown supported?
Solution:Jump to solution
you can catch the kill signal, but you only get ~3 seconds of grace time before the container is force killed, indicated by the 3 dots and a missing exit message that never got printed
33 Replies
Project ID:
95c304a2-fe2b-4ad1-bc3f-5296fd26f36c
95c304a2-fe2b-4ad1-bc3f-5296fd26f36c
send your package.json please
start:scheduler is the process that I've tested this with
"start:scheduler": "SKIP_ENV_VALIDATION=true node --es-module-specifier-resolution=node .wss-dist/src/server/scheduler/scheduler-entrypoint.js | npm run log-agent -- --service=scheduler | pino-pretty",
hahaha, that bad?
thats a lot of text
but im struggling to understand why youd want to outright stop your app, gracefully or not
Ah yeah I mean on deployment
I want previous deployments to shutdown gracefully
but why, theres a new deployment to handle requets, why do you care about the old deployment at that point
The service runs tasks that often take > 5 minutes, and aren't resumable tasks. The current configuration stops them mid-run, which breaks certain guarantees in our app.
So the ideal would be to catch the SIGTERM, stop handling new tasks on that specific instance, and finish any ongoing tasks, then call process.exit()
Just curious how you guys handle exits on your end, and if this kind of workflow is even possible. Does your removal logic allow for my deployments to define when they exit?
(i dont work for railway)
im guessing whatever runs to kills the docker containers force kills it once the new deployment is live, not respecting your gracefull shutdown handler. i will do some testing and get back to you on this
Oh wow, didnt realize that. Respect! Let me know what you learn 🙂
i have returned
with information
Solution
you can catch the kill signal, but you only get ~3 seconds of grace time before the container is force killed, indicated by the 3 dots and a missing exit message that never got printed
oh interesting...
3 seconds is a unfortunate amount for my situation, but good to know.
what is the logic you used to catch the signal? i'm still not able to even catch it.
i used golang
and you're catching the SIGTERM?
yes
obviously this works on local, but on railway the containers get force killed.
my recommendation would be to have a separate worker service, that way when you deploy a change to your main app the worker service is uneffected.
i know thats not a perfect solution since eventually you would have to deploy some changes to the worker service in the future, you might want to then also employ a 3rd party work queuing framework
and with all that said, railway is great but it cant cover every single users specfic usecase perfectly, so there may be other PAAS platforms that will wait for your app to exit on it own, or maybe your workload would even work better on a VPS
Yeah those are definitely good suggestions. We're definitely starting to hit the edges of what Railway is capable of doing at this point. Definitely not looking forward to the 10x increase in complexity from most other PaaS just for a couple small additional capabilities, though :/
Appreciate the help, Brody
if you have any more questions id be happy to answer them (within reason) 🙂
Thanks! Have you found any good PaaS services that fit the niche of being incrementally more configurable than railway without going all in on like a barebones cloud? What do most people "graduate" to once they start hitting the limits of the platform?
hey now i cant just go spouting out competitors lol, nice try though 🤣
fair enough lol. Would love to be able to configure that force kill timeout!
I suspect this is done to combat the creation of a phantom container, so I don't know how much luck you will have with this on other platforms, because no providers wants containers running unchecked
but never say never, and screw it, give fly.io a go
funny you should say that, just spent the last couple hours toying around with Fly.io. While it seems good, I honestly may just defer for the time being to batching and scheduling deploys during off-peak hours for now to avoid the headache of moving over for this specific issue.
sounds good
Out of curiosity, why are you so active on the forum? A railway super fan?
it's true, I do like railway a lot, but I also like helping people, everyone comes onto this platform with a different level of knowledge so I try to help out where I can
thats dope. appreciate you taking the time
thank you 🙂