Draining During Deployments
How long does Railway wait to drain old instances before tearing them down during deployments? We have requests that can take up to 60s to serve and want to make sure we're not dropping users during deployments. Is there any way to configure this?
14 Replies
Project ID:
33c47f57-f4aa-4640-9b44-cd0a3f034b71
33c47f57-f4aa-4640-9b44-cd0a3f034b71
3 seconds
from remove to killing the container, 3 seconds
Oof, any way to increase that?
no, however...
oh great, so RAILWAY_DEPLOYMENT_OVERLAY_SECONDS var set to e.g.
60
is sufficient to delay the teardown? e.g. the container stops getting requests but has 60s additional?I think it would stop getting requests, you'd have to do some testing around that since I've never toyed around with that setting myself
ok let me try it and i'll report back here for posterity
thanks
no problem
It would still get requests iirc. That env variable sets the time between your new deployment goes live and your old deployment goes down
All requests will go to your old deployment until that time is up, after which the deployment is killed.
There’s no way around your issue without some custom middleware
Your downtime will be minimal. After your new deployment is active there will be no downtime
and you shouldn’t be pushing to prod too often anyway. If you have a large amount of users who need constant access you should release full version updates, not patches
yeah i think the issue here is we frequently have many outstanding requests at 3 seconds and disconnecting sockets is hard to recover
i won't debate the merits of deploying frequently but regardless can't eat 2-3 self-inflicted events a day
ok testing compete, @Adam seems to be right. the env variable does keep the old deploy around, but the requests are still being routed to it meaning we still end up dropping those connections
is there no way to make RAILWAY_DEPLOYMENT_OVERLAP_SECONDS start routing incoming traffic to the newer deploy as soon as it succeeds?
No, that’s not a feature on Railway
but you can add it as a feature request in #🤗|feedback
ok final update here just in case anyone runs into this thread. DNS flips currently take between 5 and 15 seconds, so if your max request duration is say
X
, you need to set RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to 15+X
. that actually does work, today, but if you're kicking off new requests within 15 seconds a successful deployment you may be directing them still to the old deployment.
thanks @Adam @Brody