My App is Down, no logs for past 15 minutes Restart didn't work
Hey Railway team,
My production app is down. No logs on the server. It's definitely on the Railway side,
104 Replies
Project ID:
f5925531-2de0-4da9-8a6d-15b5ba712ebe
f5925531-2de0-4da9-8a6d-15b5ba712ebe
I'm getting this too
P.S i'm in the Pro plan
Same
Also after redeploy, app is suddenly not reachable anymore 😦
not even an error page, just no server response at all
Yeah there's definitely an incidence happening. Hope the railway team sees this soon and starts taking action.
Yep I hope so too ^^
But it's very weird as it only affects my prod and not the testing environment
team has been made aware
If it helps the investigation, my uptime monitor fired at 9am gmt
Ack
Looking into it
good luck!
New reply sent from Help Station thread:
Only 1 out of 8 applications seems to be down for meYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Mine too! There is something going on!You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Same issue with my node server. As with my mysql databases.All projects are affected.Can't connect to any node-app or mysql database on any project hosted on railway.You're seeing this because this thread has been automatically linked to the Help Station thread.
An incident has been reported - https://status.railway.app/clxa4z5c81345703e2oe5y8bmsy9
thank you Brody
New reply sent from Help Station thread:
Can't redeploy either, but that's of less importance.You're seeing this because this thread has been automatically linked to the Help Station thread.
Adding to the data, our app is down too, seems like it’s Postgres given that landing page is fine, just main requests from DB are 500ing
New reply sent from Help Station thread:
How long is the expected recovery time?You're seeing this because this thread has been automatically linked to the Help Station thread.
the team have not posted an ETA yet
New reply sent from Help Station thread:
I think this is a catastrophic accident, and I hope there will be an official announcement afterwards. Our company's business has been severely affected.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
I have to agree. We have thousands of users that can't access our services for a longer period...You're seeing this because this thread has been automatically linked to the Help Station thread.
They will be publishing a post mortem after the incident has been resolved, either me or someone else will link it here when it is posted
hey @Brody , would upgrading to pro right now help ? Or will it be the same scenario
Majority of instances should be restored at this point
Experiencing the same issues mentioned above
Some of the projects run fine, others stopped working
Pro users have also been affected
I'm still down
New reply sent from Help Station thread:
In our case, the projects with static data are fine. All the websites with dynamic data from DB are down.You're seeing this because this thread has been automatically linked to the Help Station thread.
I guess the Databases are down.
this incident can affect all services not just databases
New reply sent from Help Station thread:
Nothing restored yet in our case.You're seeing this because this thread has been automatically linked to the Help Station thread.
you may have to redeploy affected services
I get this error when I do that
I'm getting the screenshot above too, does that mean no deployments?
It means priority deployments first
in the post mortem it would be good to look into why this was first raised on the forum and not via automated monitoring
Yes, Pro users would have builder priority at this time
I'm in Pro plan, tried redeploying but I'm stucked at this:
please note that Jake said a majority, this issue has not been fully resolved yet
It was raised via automated monitoring
But yes, sure
? there was no incident when we raised it here, it started about 9am
New reply sent from Help Station thread:
Deployments not working either.You're seeing this because this thread has been automatically linked to the Help Station thread.
Still unable to deploy services normally...
My app instance has been down all this time.
But I was able to access my MySQL instance. Now I can't.
I assume this is part of the restart.
same here. my production site has been and it costs me and customers too 😦
It is what it is.
I owe someone who is not technically knowledgeable an explanation.
hope it gets back up and running again
Limited Access - Disabling for hobby while we restore systems :HAHAHA:
I also experience the same issue. Hope things get back to normal very soon
Saying poor in diplomatic
Same here...
New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.Just resolved itself without redeployYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
This is the second time our app has been down for hours in a few months. I understand these things may happen, but it would be good to take responsibility and provide compensation to those financially affected by the issue (which is exactly what I do when something goes wrong in the business).In our case, this situation might cost us about €100 in losses at the moment, not to mention the time wasted and the impact on our SEO, which might cost a lot more in the long run.You're seeing this because this thread has been automatically linked to the Help Station thread.
I'm back online
New reply sent from Help Station thread:
Howwwwwwww mannnYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
MyProject still unable to deploy services normally…You're seeing this because this thread has been automatically linked to the Help Station thread.
My website is still down. But now I get a différent message. It has to do with the database. I can't access the database. I get an error
I get application failed to respond error
On the client
Systems partially work again. Live system running, Test system still no database.
@André did you have to redeploy or restart?
@rickitan
For Live to work again I redeployed every ~5 minutes to check. Now it works again.
For stage I did nothing yet but the mysql server for laravel crashed. Redeploys / Restarts not woroking for stage yet
Now all systems are back to normal (My systems)
Awesome!!
They are not.
Yes I was asked to restart but deployment still crashes
Same thing here
I think my API are back to normal now
I've done a re-deployment, and back online, thanks, team.
2+ hours is not a good thing though, it's going to hurt.
@King Jahad
Yes but I expect an report to be published hopefully today / tomorrow.
It's one of the most awful thing that can happen. Let's give them some time to investigate and for every operation / downtime costs on client side there is time to discuss after.
The most important thing is to get everything up and running again 😉
:blobyes:
+1
I am not mad, just don't know who to blame for this.
and how to make a client understand
In case anyone is still experiencing issues, make sure you try to restart / redeploy, hobby or pro
I've tried dozens of times. Still down.
This is the first time it worked for me, shouldn't long now if I had to guess
I couldn't connect to Postgres for the longest time and it finally came online
I had issues with both a hobby and pro service
I can't even access the database from the ui
Try force refresh (shift-command-r)
Same thing.
i'm not able to see the list of deployments, so i can't redeploy what i can't see
New reply sent from Help Station thread:
personally - tried re-deploying multiple times, last 3 min ago, didnt help. Deployments are successful, but no logs are displayed, and server is offlineYou're seeing this because this thread has been automatically linked to the Help Station thread.
I'm back online
New reply sent from Help Station thread:
In our case, deployments are successfull, and crash after few seconds/minutesYou're seeing this because this thread has been automatically linked to the Help Station thread.
I tried to redeploy again and it works now
Thank you very much for your suggestions
i'm seeing the deployments.. so i'm redeploying now
My app still can not redeploy because the connection to Postgres DB still failed 😦
Give it time, same was happening to me.
I believe they are doing a massive restart. So some servers are restarted and come online before others.
for (server in servers) {
await server.restart()
}
Something like that
redeploying my app failed, but it just gives me "no build logs found for deployment"
after several attempts, my server was succesfully redeployed. it wasn't able to connect to an internal Redis
The redeployment was successful, I almost lost my job today. :HAHAHA:
I setup demo for my team today and Railway failed just 10mins right before the meeting. Nothing more embarrassed for me than this.
ironically my dev environment is working fine
New reply sent from Help Station thread:
thanks I redeploy and now its works for meYou're seeing this because this thread has been automatically linked to the Help Station thread.
We're back online too
latest deployment seems to have stopped in the middle of it.. never got to the health check + not showing any deployment log, only build log
It was the same for me ^^
At least the "least" important project didn't worked. Suuuureeee 😂
probably time to move to aws ðŸ˜
redeploying 3rd time's the charm
i am on beta V2 of the builder, and V2 runtime.. hopefully that is not biting me in the ass
not able to reach the database right now during the deployment
i'm seeing the postgres service, but when i click on data it can't establish a database connection (update: it popped in after 2-3 minutes)
not seeing my app's deployments again (nm, seeing them)
same to me, database it's working, but the app it self doesn't work
If I moved to AWS I would probably cause worse downtimes than trusting the railway team lol. It's just not my expertise.
But this 2h long one was definitely a bad one.
the same happened to me on the previous massive outage 😫
New reply sent from Help Station thread:
aThis incident has been resolved.Once again, we apologize for the downtime. We'll be publishing a post-mortem of this incident soon.You're seeing this because this thread has been automatically linked to the Help Station thread.
Definitely looking forward to the post-mortem on how this can be prevented in the future. We have enterprise clients, and not sure how they can trust us when there’s a complete app outage today and last December. It is frustrating for us.
Hey there Kevin,
I don't want to reveal too much about your customer data in a semi-public forum, but I am pretty sure that I speak behalf on the Railway team on how sorry we are that you had end user impact to your workloads.
We have a number of mitigations planned for the Infra side, but I can speak personally to how we change how we make it easier for those to immediately get in touch with our Infra team when issues arise.
I just sent your company Slack invites so we can continue the conversation there.
This is also a standing offer for anyone else impacted this way as well, we are working with all affected companies to deliver the post mortem and work on next steps.
@angelo thanks! Slack connect would be useful, especially since that's our main workspace.
I don't see the slack invites, can you resend? DMing you my email
Here's the retro/post mortem
It's up on the forums. Happy to discuss anything here or there
https://help.railway.app/questions/incident-response-june-11th-2024-733fbd5d
Railway Help Station
Incident Response - June 11th 2024
This thread serves to aggregate discussion for the incident on June 11thThe full response can be found at https://blog.railway.app/p/2024-06-11-incident-reportRailway takes these incidents very, very seriously. Internally we've been working on infrastructure improvements which will make the platform faster and prevent outages like this from happ...
I am seeing the same issue again, maybe related to the outage on the 8/27? it keeps occuring for us after the outage last experience it on 8/29
I got the same problem on October 12, 2024. Are we experiencing the same problem?
I don't see how that is related, looks like you are opening too many connections