Health check failing after minor code change
I made a very minor text change, and now my deployment isn't working where as it was working fine yesterday. The deploy logs look normal, but my build logs show that my health check isn't working (it works locally).
59 Replies
Project ID:
N/A
N/A
My health check is
/health
which serves:
what is the health check failing with?
Attempt #6 failed with service unavailable. Continuing to retry for 6m31s
are you on the legacy or v2 runtime? check your service settings
legacy
Oh wait
It's V2?
on the v2 runtime your app needs to listen on
::
Did it get auto-switched or something?
might have
I don't love that. Would've been nice to know about a breaking change like this.
Is that safe for me to switch back to legacy?
And where would I find docs for the difference between legacy and V2?
You can indeed switch to legacy.
it has been mentioned in two places, what else would work best for you?
https://railway.app/changelog/2024-06-07-cash-template-payouts#runtime-v2
https://help.railway.app/feedback/new-runtime-v2-magic-port-detection-2b530a34
We expect the legacy runtime to stay in place for as long as we get the expected behavior that our users need.
https://docs.railway.app/reference/runtime
if the healthcheck issue is the only issue you face, I cannot recommend switching back to the legacy runtime
fwiw the health check issue has been reported to the team
That's fair, but I'm not always looking at the changelog unless I'm interested to see what new features are available to me. I do get emails about the changelog with some basic bullet points, but it would've been nice to have in this email, or a separate email a message long the lines of:
"Starting 6/x/2024, all services will be switched from Legacy to V2, and here's what you need to do to before then:"
Ohhh. So it wasn't anticipated.
That's fair.
Ok. Thanks for the help! Will fix up my
/health
response 😄its a fair assumption that the changelogs would only include new features, and they do, but that also mention migration timelines and such for new features and new features always have the possibility to cause issues
True, but imo known breaking changes should be communicated more directly. I don't always have the time to read changelogs for all of the services I use. My project is a hobby project, so no bigs, but for the enterprise customers, that could put a snag in their work. Luckily the support here is really on top of things!
i dont think this was known tbh, but i have no way to know for sure
waiting to hear back from char on this issue
Yeah, in that case, it's a hiccup. And good on the Railway team for testing with Hobby accounts first so they can find these issues before they reach enterprise customers.
I'm still having issues getting this to work. I've added
[::1]:$PORT
to my gunicorn command. I've confirmed this working locally, but still having trouble with the health check
So it was gunicorn project.wsgi
and now it's gunicorn -b 127.0.0.1:$PORT -b [::1]:$PORT project.wsgi
it needs to be
::
not ::1
Ah, see, I tried that, but I get
[ERROR] Connection in use: ('::', 65090)
Dumb ask and unsure if you did this in the past, does switching to Legacy confirmed will fix the issue? Wanna make sure our network engineer can do a proper repro.
I figured that there's already something running there.
yes, check #🦸|conductor-chat
I'll give it a try here and confirm.
I don't have access to that channel.
He is flagging me to another case 🙂
ahhh
We just wanna have more languages to test runtime with hence why I ask.
The more cases the better.
Sounds good! Yeah, I'll test and report back.
And sorry to use you as a test pig, I can comp you the month since you are doing QA work.
Solution
gunicorn -b [::]:$PORT project.wsgi
Yup! Tried that, and got the "Connection in use" error.
Much appreciated!
deploy logs please - https://bookmarklets.up.railway.app/log-downloader/
new role added
comped, test away, let us know when you have recovered the healthcheck
Ah, looks like that only gets the newest logs. Here's what it shows:
ill try to reproduce
what version of gunicorn?
I changed my gunicorn command back to what it was, and flipped the runtime to Legacy and it deployed successfully. And that's including the minor code change mentioned in the original post.
21.2.0
Gotcha- that seems to be enough, going to add this case on the Runtime V2 blockers in the root thread.
Great. Thanks!
my start command is
gunicorn -b [::]:$PORT main:app
on the v2 runtime with the same gunicorn version you are using, so this new error doesnt look like a v2 vs legacy issueBut what would already be running on that port? 🤔
In my case.
does your container run gunicorn and only gunicorn?
It runs a couple django commands before gunicorn.
migrate
and collectstatic
can you provide the full command
python manage.py migrate && python manage.py collectstatic --noinput && gunicorn project.wsgi
and what was the command when you got this error?
python manage.py migrate && python manage.py collectstatic --noinput && gunicorn -b [::]:$PORT project.wsgi
I may have had an extra -b 127.0.0.1:$PORT
in there for IPv4.
Testing just [::] atmthat would do it, gunicorn supports dual stack binding anyway so that wouldnt be needed, 127.0.0.1 would also be the incorrect address
Their documentation seems to suggest that you need to state both: https://docs.gunicorn.org/en/stable/settings.html#bind
and I'm assuming the correct address is
0.0.0.0
?Yes, binding on 127.0.0.01 won't bind properly.
But wondering why legacy did it.
I didn't have that for legacy. I was just adding it in because I assumed I needed it if I also needed to have IPv6. My bad.
Alright, well. It worked with
python manage.py migrate && python manage.py collectstatic --noinput && gunicorn -b [::]:$PORT project.wsgi
on V2by default gunicorn binds to
0.0.0.0:$PORT
so that would have worked for legacy
as i suggested 🙂Yup! For some reason I thought I tested that. Sorry about that.
no worries
Guess this isn't a new bug then, Angelo! I apologize. New to messing with IPv6.
it is a new bug
you should not need to listen on ipv6 just for the health check to work
Yea, if any behavior is different vs. old, its a bug.
You did us a favor.
technically solved
Update, health checks can now pass if your app only listens on
0.0.0.0
but if you have already changed it to ::
there's no point in changing anything back as listening on ::
has no known drawbacks.