Railway•10mo ago

Deployment removed before healthcheck

For some reason my old deploy is getting removed before the health check is actually completed for the new deployment :Hmmge: This is causing a breif period of downtime every deploy.

15 Replies

Percy•10mo ago

Project ID: 8a562b1b-8488-472e-b420-02478d2a8df0

UnsmartOP•10mo ago

8a562b1b-8488-472e-b420-02478d2a8df0

Brody•10mo ago

increase RAILWAY_DEPLOYMENT_OVERLAP_SECONDS to 35

UnsmartOP•10mo ago

Do I just put that in env I assume? And is there a max value I can put? Might do a bit more than 35

Brody•10mo ago

as a service variable, start with 35 and increase from there, you likely can go up to something like 4 hours

UnsmartOP•10mo ago

Yeah this isnt doing anything... the new build gets published at 22:57:53 and the old deploy is gone at 22:58:00. Only 7 seconds :Bruh: And the health check didnt succeed until 22:58:12 :sad: I thought if the health check didnt succeed it wouldnt promote the new build at all if you accidentally do something that doesnt build properly :Hmmge:

Brody•10mo ago

thats how it should work, yeah

UnsmartOP•10mo ago

Hmm interesting I have managed to break the health check system somehow :LUL:

Brody•10mo ago

congratulations!

UnsmartOP•10mo ago

any chance you'd be able to get someone from railway to look at why the healthcheck isnt working for me :Prayge:

Duchess•10mo ago

Thread has been flagged to Railway team by @Brody.

UnsmartOP•10mo ago

tyty 😄 Also the service in question is railway-cloudflared if needed feel free to restart whenever nothing important hosted there 🙂

Melissa•10mo ago

innnteresting. this behavior is isolated to just this service? thanks for the screenshots, super helpful

Linear•10mo ago

Issue PRO-1854 created.

PRO-1854 - Active service is stopped before healthcheck succeeds

Healthcheck is configured for a service and is being executed, however, you can see in the logs (see screenshots in the thread) that the active/healthy service is being stopped before a successful healthcheck, resulting in a period of downtime

Status

Triage

Product

UnsmartOP•10mo ago

Yeah only this service, I have another service in that same project that the health check works fine on (railway-rust). Well I have found the issue and I guess its as expected. Was going to look into potentially trying to auto scale replicas for the service but noticed I cant make a replica and its because of volumes... says that it prevents multiple deployments thus the healthcheck is basically pointless lol. Sucks because this service only needs a single file that is read only :sad:

Gaming

Programming

Deployment removed before healthcheck