Random 502 Reponse – Nothing is crashing
Greetings Railway Team!
We're using Vercel as the frontend host and Railway as the backend host.
Vercel communicates with Railway via reverse proxy.
Once in a while we're catching "502 Bad Gateway" errors on our frontend that affect user's experience [image 1].
On the backend however, we see no logs corresponding to any crash whatsoever [image 2].
When I check the CPU Usage, I see those weird spikes [image 3], as well as spikes in Network Egress [image 4]
Memory Usage also spikes a little, but seems ok [image 5].
We've recently made many changes on the backend side, but also changed our DNS provider from GoDaddy to Cloudflare – with proxy disabled.
Do you think it's strictly application's issue? I find that to be weird since it would crash otherwise.
Once again, this happens quite rarely – but still happens.
What do you think? Thank you.
Solution:Jump to solution
Hello, yes this is strictly an application level issue, it looks like your application may not be able to handle every request and thus doesn't answer some resulting in a 502, fortunately the solution is simple, add another replica or two!
11 Replies
Project ID:
N/A
3e6a2b9c-4e34-41f8-979a-b83b27f3198d
Solution
Hello, yes this is strictly an application level issue, it looks like your application may not be able to handle every request and thus doesn't answer some resulting in a 502, fortunately the solution is simple, add another replica or two!
Thank you, will consider looking deeper into what could be causing this – resources aren't even close to the limit so don't think replicas will solve the issue.
you're right, resources aren't the bottleneck here, the bottleneck would be your applications availability to use the available resources, so yes replicas aren't going to fix the root issue they are going to mitigate the 502s since the load on any individual replica will be lower
thank you. consider this ticket to be closed!
sounds good! happy to help
Hi, wanted to clarify one detail – Can it be theoretically possible that during the vCPU scaling a process might be not responsive the same way as it's not responsive when the app resumes from a sleep?
I looked through 502 error support requests and found out that's it's a known issue that's not fixed – essentially it's scaling a container from 0vCPU to minimum 0.1vCPU.
Can the same issue persist while scaling 0.1vCPU to 1vCPU in example?
We only see 502 errors when scale happens, works good after
unless you have app sleeping enabled, nope, as long as your app is running there is no scaling happening
it doesn't have sleeping enabled – but I can inspect that vCPU usage goes to 0
yep programs can use 0% cpu utilization