R
Railway6mo ago
Dem

Server goes down randomly throughout the day

Recently I realized the production railway server goes down randomly throughout the day and show 503 error. What's going on? Can someone take a look? My project ID is 99e122f7-a96e-42ba-95aa-325cd3e66c82
<!DOCTYPE html>
...
<h1 class="error-404">Nothing here... yet</h1>
<h1 class="error-503">Application failed to respond</h1>

<a href="https://railway.app" target="_blank"> Go to Railway </a>
</main>
</body>

</html>
<!DOCTYPE html>
...
<h1 class="error-404">Nothing here... yet</h1>
<h1 class="error-503">Application failed to respond</h1>

<a href="https://railway.app" target="_blank"> Go to Railway </a>
</main>
</body>

</html>
56 Replies
Percy
Percy6mo ago
Project ID: 99e122f7-a96e-42ba-95aa-325cd3e66c82
Brody
Brody6mo ago
do you have any logs for when you get the error page? what kind of app?
Dem
DemOP6mo ago
no logs, I just can't reach the server so there's no logging it's a fastapi backend
Brody
Brody6mo ago
right but if your application was erroring and not responding, ideally there would be logs
Dem
DemOP6mo ago
all of the 0s are downtimes
No description
Dem
DemOP6mo ago
It looks like it's railway that's not responding
<h1 class="error-404">Nothing here... yet</h1>
<h1 class="error-503">Application failed to respond</h1>

<a href="https://railway.app" target="_blank"> Go to Railway </a>
</main>
<h1 class="error-404">Nothing here... yet</h1>
<h1 class="error-503">Application failed to respond</h1>

<a href="https://railway.app" target="_blank"> Go to Railway </a>
</main>
No error from my app
Brody
Brody6mo ago
that page is shown when your application doesn't respond are these https requests? what am I looking at here?
Dem
DemOP6mo ago
The server soetimes works and sometimes doesn't without any changes from my side Yup, this is in postman I'm calling the backend hosted on railway
Brody
Brody6mo ago
do you have a custom domain? I understand how it sounds but that does not rule out an issue with your app it also does not rule out an issue with railway, but from experience it's more often an issue with the application
Dem
DemOP6mo ago
Yes I have a custom domain How to debug if it's railway or my app? It works perfectly locally My other friends also have uptime problems with railway and have migrated to render
Brody
Brody6mo ago
unfortunately working locally does not rule out an issue with the application either do you have the edge proxy enabled?
Dem
DemOP6mo ago
What's edge proxy? Is this from domain side (e.g. namecheap)
Brody
Brody6mo ago
it would be in the service settings
Dem
DemOP6mo ago
Should I enable this?
No description
Brody
Brody6mo ago
yes, but first, you said your domain provider was namecheap?
Dem
DemOP6mo ago
yes
Brody
Brody6mo ago
are you sure you are using the correct generated cname it gave you when you set it up?
Dem
DemOP6mo ago
yes it's been assigend to this domain name for months
Brody
Brody6mo ago
I'm sorry but that answer does not instill confidence, I would like to ask for confirmation
Dem
DemOP6mo ago
yes I'm sure
Brody
Brody6mo ago
you are using the generated cname, not the auto generated domain, correct?
Dem
DemOP6mo ago
Yes
Brody
Brody6mo ago
go ahead and enable the edge proxy
Dem
DemOP6mo ago
Done, what should I do next?
Brody
Brody6mo ago
wait and see if you continue to have issues
Dem
DemOP6mo ago
When should I check back in? Just tried postman and still have the same issue
Brody
Brody6mo ago
what's the state of your deployment
Dem
DemOP6mo ago
deployed
Brody
Brody6mo ago
I'm sorry but that's not a valid state
Dem
DemOP6mo ago
No description
Brody
Brody6mo ago
yes, please tell me it's state
Dem
DemOP6mo ago
What does that mean?
Brody
Brody6mo ago
it's deployment state
Dem
DemOP6mo ago
No description
Dem
DemOP6mo ago
No description
Dem
DemOP6mo ago
Completed?
Brody
Brody6mo ago
your app has exited this would not be a platform issue
Dem
DemOP6mo ago
Where do you see that the app has exited
Brody
Brody6mo ago
completed
Dem
DemOP6mo ago
How do I fix it?
Brody
Brody6mo ago
first, let me correct myself, the edge proxy is not going to help here, I had asked you to make that change without enough information from you. second, since this is an issue with your application I would recommend implementing error handing everywhere and verbose logging to help you narrow down the issue. remember, railway only ever runs your code as-is, so if it's exiting that's something your app is doing, not the platform
Dem
DemOP6mo ago
What does it mean for the app to have exited? The app bugged out and shut down?
Brody
Brody6mo ago
the app exited with a non error code for (at this time) an unknown reason
Dem
DemOP6mo ago
Hmm I see, I'll look into it thanks
Brody
Brody6mo ago
I wish you the best of luck in your debugging endeavour
Dem
DemOP6mo ago
Is it possible it exceeded resource constraints? Is there some way to check for that?
Brody
Brody6mo ago
you think your app could have exceeded 32gb of ram?
Dem
DemOP6mo ago
We run a 100M parameter LLM model, 32b should be enough
Brody
Brody6mo ago
what do your memory metrics look like?
Dem
DemOP6mo ago
Hmm goes up quite high
No description
Brody
Brody6mo ago
have you received any emails from railway that state you ran out of memory?
Dem
DemOP6mo ago
Nope these are the latest emails
No description
Brody
Brody6mo ago
then it doesn't seem like that's the issue
Dem
DemOP6mo ago
What does completed mean? This deployment is "completed" instead of "active" but up and running, have healthy logs
No description
angelo
angelo6mo ago
Completed is when you exit using a 0 exit code. However, going to raise this to the team for investigation. !t
Duchess
Duchess6mo ago
New reply sent from Help Station thread:
This thread has been escalated to the Railway team.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
It does look to me like the app is restarting if I'm reading logs correctly, but doesn't seem like it's due to OOM or CPU based on the metrics graphs.Can you confirm if this log line prints when the app starts: "DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): huggingface.co:443". That could also explain why the status is being updated to "Completed". As Brody suggested, I would also encourage you to add some more verbose logging and error handling to help track down the issue, even starting with a clear debug line for when the app starts so it's easier to see when/if it restarts.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Want results from more Discord servers?
Add your server