We need serious help to continue using railway. Server is halting often and we are not sure why.

Every so often we see "Application Failed to Respond". There is no log about a crash or fail in railway logs. We are not sure if the app actually failed. We do not see any logs to why this is failing. Nor is there logs of server restarting. We have a log for server start Server running on PORT ${PORT} But this is missing when the app is back up as well. Ram and CPU usage seems normal as just before the app is turned off. PLEASE help us. We have tried everythig, sprinkeled logs everywhere, added try catch everywhere. At this point the only thing left to move is to try a different host from our end. Just in the last few hours the app went off 2 times and it takes a while to come back up Project id: 6d0f799e-be59-4388-899d-f00456f30667 I have a post for this already: https://discord.com/channels/713503345364697088/1246339314829492294/1246339314829492294
No description
No description
93 Replies
Percy
Percy4w ago
Project ID: 6d0f799e-be59-4388-899d-f00456f30667
Brody
Brody4w ago
are you using the legacy or the v2 runtime?
KiBender
KiBender4w ago
How do I check? Maybe. This is an old app
Brody
Brody4w ago
in the service settings
KiBender
KiBender4w ago
Yes
No description
Brody
Brody4w ago
switch to v2
KiBender
KiBender4w ago
Will this fix the issue?
Brody
Brody4w ago
I make absolutely no promises
KiBender
KiBender4w ago
If the server is restarting I should see "Server running on PORT ${PORT}" again right? That is not happening. Its just logs dissapear then after a while it comes back up
Brody
Brody4w ago
yes that is correct, that's the behaviour you should see if the app is restarting, but with the information I have this seems like your app is locking up. try the v2 runtime and report back
KiBender
KiBender4w ago
Can you explain what is "locking up"?
Brody
Brody4w ago
soft locking, Google could explain the term better than I could
KiBender
KiBender4w ago
Okay got it. is there something we should do in the app level to avoid this?
Brody
Brody4w ago
I wouldn't be able to tell you that, there's a million different things that can cause code to lock up, but definitely try the v2 runtime!
angelo
angelo4w ago
Hey there @KiBender - I know you’ve raised support questions in the past and I am sorry to continually make you rehash information. Are you perchance using Prisma or managing a large amount of DB connections? There are a few things in flight that we are flighting to fix stability on the platform and I don’t wanna rule out Railway but wanna make sure I am able to gather the properties of your app. (Such as V2 Runtime and V2 Proxy)
KiBender
KiBender4w ago
Yes we are using prisma. But we switched to a new database using kysely from you guys suggestion for heavy memory usage in prisma Prisma for app settings kysely for data both seperate db
angelo
angelo4w ago
Fair and noted, do you see the restarts on load or just randomly?
KiBender
KiBender4w ago
Randomly, We've been trying to associate the last logs and trying to work around from there but honestly everytime its different and already handled errors
angelo
angelo4w ago
Also for the DB connections, are you using the Internal network? (I think you are) If random then I have a strong suspicion that the new runtime would help then.
KiBender
KiBender4w ago
"monorail.proxy.rlwy.net" I think sometime in the past to debug this we moved to the non internal network and viaduct.proxy.rlwy.net Both are external url right?
angelo
angelo4w ago
I would suggest you move to the Internal network so you don’t get hit with egress charges but also, we control the network not GCP and we have been continually mitigating public connection issues. (Knock on wood none more yet but you never know.)
KiBender
KiBender4w ago
I will do that, will try to switch to internal urls
Brody
Brody4w ago
(and the v2 runtime)
angelo
angelo4w ago
If your application depends on postgres connections and you get random disconnects, very likely your connection pool will exhaust.
KiBender
KiBender4w ago
Already redeployed with v2 as well
angelo
angelo4w ago
It’s 2 AM my time but if you can, keep us updated- we have the whole logistics team on standby addressing customer issues this whole week. We know about the flakiness and we are addressing every issue as quickly as we can dispatch them. !t
Duchess
Duchess4w ago
New reply sent from Help Station thread:
This thread has been escalated to the Railway team.
You're seeing this because this thread has been automatically linked to the Help Station thread.
KiBender
KiBender4w ago
Thank you both of you
angelo
angelo4w ago
Melissa is our on-call this week, it’s very likely the person from the Railway team who will respond will be her. Thank you for your patience even among your frustration.
KiBender
KiBender4w ago
- We know the app is only failing at day time for us. Our app users are all from my country so its when people are using it - We have a fairly large db which is updated often. We run a shopify app and we have to sync every order update and delete with webhooks. We use kyesely query builder for this - We have 2 dbs for data and settings - We have zero logs about why its failing
angelo
angelo4w ago
Doesn’t seem that random then if there is load, however, my gut is telling me those connection drops is exhausting the mem pool.
KiBender
KiBender4w ago
But ram usage doesn't look like there is much load
angelo
angelo4w ago
However, the connection drop might be- nothing concrete, I think we would also attempt to try the nee proxy as well.
KiBender
KiBender4w ago
Random in the sense, i cannot identify something that is causing this based on the logs.
angelo
angelo4w ago
Understood, thats frustrating.
KiBender
KiBender4w ago
how do I try the new proxy?
angelo
angelo4w ago
Edge Proxy Beta in service settings
KiBender
KiBender4w ago
Should I turn it on or see if it fails with the new runtime first?
angelo
angelo4w ago
(I for one am not a big fan of telling people to enable beta features to fix prod problems) Let’s wait for the new runtime to do it’s thing Then if so, we can add proxy.
KiBender
KiBender4w ago
Cool. thank you I will report back
Brody
Brody4w ago
fwiw I have seen the v2 runtime fix flakiness
KiBender
KiBender4w ago
that would be ideal
Brody
Brody4w ago
same with the edge proxy, and same with the v2 builder, railway is cooking
angelo
angelo4w ago
To set expectations, I likely won’t be doing the follow up as my shift is ending (well I did but I take it personally when anyone has a bad time on here) but jumped to make sure to let you know that the team is on top as we can be here. Not fast enough, but it’s all hands on deck here. Checking in @KiBender - no issues over your day?
Ray
Ray4w ago
cc @Mig
Mig
Mig4w ago
hey @KiBender, I've built the new edge proxy we're asking you to use. We've had some customers mentioned intermittent 503 responses and switching to the new proxy has resolved the issue. On the topic of beta software, this proxy has been in production for over several months handling our own internet properties (nixpacks.com, help.railway.app, blog.railway.app) and other user's have been opting in to the new proxy over the past month with great success. Every customer we've suggested switching has reported no more 503 responses (application failed to respond). It also offers faster deploy times and a request id mechanism so we can determine what went wrong with your request. We plan on surfacing this information in the dashboard directly soon. Any issues you run in to you can revert to the old proxy and it will be using the proxy you're use to after 1 minute (DNS cache). You can also @ me directly on discord and I'll help right away (I'm ET timezone)
KiBender
KiBender4w ago
Hello @Angelo No issues yesterday. Thanks @Mig Will try to switch to the new proxy today and revert backl
angelo
angelo4w ago
Beautiful news, we will keep our eyes out.
Brody
Brody4w ago
love to hear this, v2 everything is a massive improvement
KiBender
KiBender3w ago
Hello @Mig @angelo Today morning the app crashed. It was saying it crashed in railway. But it did not restart. Even though in the service settings Restart policy is Always I clicked restart it said it restarted but the app is still down
Mig
Mig3w ago
it's in this state right now right ?
KiBender
KiBender3w ago
Yes No logs in railway or no logs in glitchtip ( we use it for errors)
angelo
angelo3w ago
Railway
404 - Page not found
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.
KiBender
KiBender3w ago
https://gst-next.storetools.io/ The app is just loading. It wont work as a direct url but it should show unauthorized Yes The prod environment
angelo
angelo3w ago
Gotcha- mind if I trigger a rebuild?
KiBender
KiBender3w ago
No worries. But I wanted to show you this before I click
angelo
angelo3w ago
Wanna see where it's related. Do so, your uptime matters more than our debugging.
KiBender
KiBender3w ago
100% Can I redeploy now?
angelo
angelo3w ago
Yes
KiBender
KiBender3w ago
Weirdly this time we did not see the black screen. It was loading then errored out. Our app runs insde shopify. I feel this is a little bit less scary than the black railway error
No description
Mig
Mig3w ago
app is loading now
KiBender
KiBender3w ago
App is working now after the redeploy
Mig
Mig3w ago
I believe I know what the issue was. Will dig into it.
KiBender
KiBender3w ago
I would appreciate it a lot if you can help us solve this. We actually see increase in uninstalls now
KiBender
KiBender3w ago
Railway shows both as active
No description
angelo
angelo3w ago
In this case, I think it would be wise to spin up a few replicas for failover just in case. I think we know the root cause in this case and it isn't a V2 bug, just a possibly extremely unlucky circumstance.
KiBender
KiBender3w ago
Damn. Was this the same issue with v1?
angelo
angelo3w ago
No, the switch to V2 fixed the Railway error issue that you saw. The edit: likely reason you ran into this error was the machine your app landed on was cpu bounded and slow to respond. - a case of luck in this case.
Mig
Mig3w ago
If you increase the replica count your app will be placed on multiple boxes so if 1 has an issue we'll route to another instance this wasn't an issue with the v2 proxy.
KiBender
KiBender3w ago
I can create 3 replicas but we have cron jobs happening every 24 hours. Would we need to move that out to a seperate service?
angelo
angelo3w ago
Preferably yes I have also compensated your account for the outages that you've faced and for you to not have to bear the increased financial cost of the replicas on your account. Can you confirm that you recieved credits on your account?
KiBender
KiBender3w ago
Yes I see "$ 200.00" Credits Available
angelo
angelo3w ago
Gotcha- hopefully this holds you over in the meantime. Can you go ahead and deploy a few replicas?
KiBender
KiBender3w ago
How many do you suggest? I have never done this before
angelo
angelo3w ago
3 should be good This is horizontal scaling and how you get redundancy, welcome to Scale™️ Jokes aside, this will make it so that there should always be a healthy instance to take a request. Essentially works like this [] [] [] | | | \ |/ [Proxy]
KiBender
KiBender3w ago
Thank you. I think I understand this. But we would still love to understand what is going on in the app
angelo
angelo3w ago
As in? The error you faced before or what replicas do to your app.
Mig
Mig3w ago
I want to ship to the dashboard access logs. So in this case you would see that a request was made to your app but the app failed to respond (timeout). It wouldn't tell you that the machine the app was on was having issues though.
KiBender
KiBender3w ago
I'm still not fully sure this is just a railway glitch up. I have seen "Application not responding" many times before. - Railway was not auto restarting before. It didn't now as welll - No logs from our app before and today. Either we are extremely unlucky or I see a pattern:blob_help:
angelo
angelo3w ago
I will go ahead and answer both for you: 1. You got very unlucky with your workload placement, it didn't die, hence why you didn't see a crash, the box it was on was pinned which made responses extremely slow. So it wouldn't have restarted since your application never exited with a non-zero exit code. 1b. Runtime V2 has a known issue with logs, we are in progress to fix this issue. 2. Replicas just run copies of your server, but we essentially give you only one interface to all the instances, you don't need to change anything with the And after yesterday, short of this issue, you weren't reporting any issue, was this still case after the move to V2? (w.r.t to restarts)
KiBender
KiBender3w ago
yesterday I recieved this
No description
KiBender
KiBender3w ago
We didn't trigger a deploy It was the change to v2 which triggered the last deployment Follow up for 1 We are not seeing this first time after the v2 change. This has been happening for some weeks now. We have been thinking this is a code issue and trying to add try catch everywhere
KiBender
KiBender3w ago
Atleast since may.
No description
No description
No description
angelo
angelo3w ago
Yea, so V2 runtime fixes the above. But you are saying you are getting random deploy crashed emails?
KiBender
KiBender3w ago
Are you sure this is because of v1 runtime? (No logs) If that is the case and yesterday we got unlucky. I'm happy 🙂
No description
angelo
angelo3w ago
Yep, your customers won't see that error page anymore.
KiBender
KiBender3w ago
Hi, is there any update on this?
KiBender
KiBender3w ago
Hello, With 3 replicas we not longer can use the db via postico. Anything we should do about this?
No description
Brody
Brody3w ago
are you using a database on railway?
KiBender
KiBender3w ago
Yes @Angelo
angelo
angelo3w ago
Back? Or is this related to the DB
KiBender
KiBender3w ago
We cannot connect to db. What should we do? How many connections does railway allow? @T4P4N
Brody
Brody3w ago
not a railway limitation, postgres itself has a default limitation of 100 connections