Railway•6mo ago

Health check failing after minor code change

I made a very minor text change, and now my deployment isn't working where as it was working fine yesterday. The deploy logs look normal, but my build logs show that my health check isn't working (it works locally).

Solution:

gunicorn -b [::]:$PORT project.wsgi

Jump to solution

59 Replies

Percy•6mo ago

Project ID: N/A

hmhOP•6mo ago

N/A My health check is /health which serves:

def health_check(request):
    return HttpResponse(status=200)

def health_check(request):
    return HttpResponse(status=200)

Brody•6mo ago

what is the health check failing with?

hmhOP•6mo ago

Attempt #6 failed with service unavailable. Continuing to retry for 6m31s

Brody•6mo ago

are you on the legacy or v2 runtime? check your service settings

hmhOP•6mo ago

legacy Oh wait It's V2?

Brody•6mo ago

on the v2 runtime your app needs to listen on ::

hmhOP•6mo ago

Did it get auto-switched or something?

Brody•6mo ago

might have

hmhOP•6mo ago

I don't love that. Would've been nice to know about a breaking change like this. Is that safe for me to switch back to legacy? And where would I find docs for the difference between legacy and V2?

angelo•6mo ago

You can indeed switch to legacy.

Brody•6mo ago

it has been mentioned in two places, what else would work best for you? https://railway.app/changelog/2024-06-07-cash-template-payouts#runtime-v2 https://help.railway.app/feedback/new-runtime-v2-magic-port-detection-2b530a34

angelo•6mo ago

We expect the legacy runtime to stay in place for as long as we get the expected behavior that our users need.

Brody•6mo ago

https://docs.railway.app/reference/runtime if the healthcheck issue is the only issue you face, I cannot recommend switching back to the legacy runtime fwiw the health check issue has been reported to the team

hmhOP•6mo ago

That's fair, but I'm not always looking at the changelog unless I'm interested to see what new features are available to me. I do get emails about the changelog with some basic bullet points, but it would've been nice to have in this email, or a separate email a message long the lines of: "Starting 6/x/2024, all services will be switched from Legacy to V2, and here's what you need to do to before then:"

hmhOP•6mo ago

Ohhh. So it wasn't anticipated. That's fair. Ok. Thanks for the help! Will fix up my /health response 😄

Brody•6mo ago

its a fair assumption that the changelogs would only include new features, and they do, but that also mention migration timelines and such for new features and new features always have the possibility to cause issues

hmhOP•6mo ago

True, but imo known breaking changes should be communicated more directly. I don't always have the time to read changelogs for all of the services I use. My project is a hobby project, so no bigs, but for the enterprise customers, that could put a snag in their work. Luckily the support here is really on top of things!

Brody•6mo ago

i dont think this was known tbh, but i have no way to know for sure waiting to hear back from char on this issue

hmhOP•6mo ago

Yeah, in that case, it's a hiccup. And good on the Railway team for testing with Hobby accounts first so they can find these issues before they reach enterprise customers. I'm still having issues getting this to work. I've added [::1]:$PORT to my gunicorn command. I've confirmed this working locally, but still having trouble with the health check So it was gunicorn project.wsgi and now it's gunicorn -b 127.0.0.1:$PORT -b [::1]:$PORT project.wsgi

Brody•6mo ago

it needs to be :: not ::1

hmhOP•6mo ago

Ah, see, I tried that, but I get [ERROR] Connection in use: ('::', 65090)

angelo•6mo ago

Dumb ask and unsure if you did this in the past, does switching to Legacy confirmed will fix the issue? Wanna make sure our network engineer can do a proper repro.

hmhOP•6mo ago

I figured that there's already something running there.

Brody•6mo ago

yes, check #🦸｜conductor-chat

hmhOP•6mo ago

I'll give it a try here and confirm. I don't have access to that channel.

angelo•6mo ago

He is flagging me to another case 🙂

hmhOP•6mo ago

ahhh

angelo•6mo ago

We just wanna have more languages to test runtime with hence why I ask. The more cases the better.

hmhOP•6mo ago

Sounds good! Yeah, I'll test and report back.

angelo•6mo ago

And sorry to use you as a test pig, I can comp you the month since you are doing QA work.

Solution

Brody•6mo ago

gunicorn -b [::]:$PORT project.wsgi

hmhOP•6mo ago

Yup! Tried that, and got the "Connection in use" error. Much appreciated!

Brody•6mo ago

deploy logs please - https://bookmarklets.up.railway.app/log-downloader/

angelo•6mo ago

new role added comped, test away, let us know when you have recovered the healthcheck

hmhOP•6mo ago

Ah, looks like that only gets the newest logs. Here's what it shows:

[2024-06-11 19:11:19 +0000] [1] [INFO] Starting gunicorn 21.2.0

[2024-06-11 19:11:19 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:19 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:20 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:20 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:21 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:21 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:22 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:22 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:23 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:23 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:24 +0000] [1] [ERROR] Can't connect to ('::', 65090)

container event container died

[2024-06-11 19:11:19 +0000] [1] [INFO] Starting gunicorn 21.2.0

[2024-06-11 19:11:19 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:19 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:20 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:20 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:21 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:21 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:22 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:22 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:23 +0000] [1] [ERROR] Connection in use: ('::', 65090)

[2024-06-11 19:11:23 +0000] [1] [ERROR] Retrying in 1 second.

[2024-06-11 19:11:24 +0000] [1] [ERROR] Can't connect to ('::', 65090)

container event container died

Brody•6mo ago

ill try to reproduce what version of gunicorn?

hmhOP•6mo ago

I changed my gunicorn command back to what it was, and flipped the runtime to Legacy and it deployed successfully. And that's including the minor code change mentioned in the original post. 21.2.0

angelo•6mo ago

Gotcha- that seems to be enough, going to add this case on the Runtime V2 blockers in the root thread.

hmhOP•6mo ago

Great. Thanks!

Brody•6mo ago

my start command is gunicorn -b [::]:$PORT main:app on the v2 runtime with the same gunicorn version you are using, so this new error doesnt look like a v2 vs legacy issue

hmhOP•6mo ago

But what would already be running on that port? 🤔 In my case.

Brody•6mo ago

does your container run gunicorn and only gunicorn?

hmhOP•6mo ago

It runs a couple django commands before gunicorn. migrate and collectstatic

Brody•6mo ago

can you provide the full command

hmhOP•6mo ago

python manage.py migrate && python manage.py collectstatic --noinput && gunicorn project.wsgi

Brody•6mo ago

and what was the command when you got this error?

hmhOP•6mo ago

python manage.py migrate && python manage.py collectstatic --noinput && gunicorn -b [::]:$PORT project.wsgi I may have had an extra -b 127.0.0.1:$PORT in there for IPv4. Testing just [::] atm

Brody•6mo ago

that would do it, gunicorn supports dual stack binding anyway so that wouldnt be needed, 127.0.0.1 would also be the incorrect address

hmhOP•6mo ago

Their documentation seems to suggest that you need to state both: https://docs.gunicorn.org/en/stable/settings.html#bind and I'm assuming the correct address is 0.0.0.0?

angelo•6mo ago

Yes, binding on 127.0.0.01 won't bind properly. But wondering why legacy did it.

hmhOP•6mo ago

I didn't have that for legacy. I was just adding it in because I assumed I needed it if I also needed to have IPv6. My bad. Alright, well. It worked with python manage.py migrate && python manage.py collectstatic --noinput && gunicorn -b [::]:$PORT project.wsgi on V2

Brody•6mo ago

by default gunicorn binds to 0.0.0.0:$PORT so that would have worked for legacy as i suggested 🙂

hmhOP•6mo ago

Yup! For some reason I thought I tested that. Sorry about that.

Brody•6mo ago

no worries

hmhOP•6mo ago

Guess this isn't a new bug then, Angelo! I apologize. New to messing with IPv6.

Brody•6mo ago

it is a new bug you should not need to listen on ipv6 just for the health check to work

angelo•6mo ago

Yea, if any behavior is different vs. old, its a bug. You did us a favor.

Brody•6mo ago

technically solved Update, health checks can now pass if your app only listens on 0.0.0.0 but if you have already changed it to :: there's no point in changing anything back as listening on :: has no known drawbacks.

Gaming

Programming

Health check failing after minor code change