Railway•4w ago

My App is Down, no logs for past 15 minutes Restart didn't work

Hey Railway team, My production app is down. No logs on the server. It's definitely on the Railway side,
101 Replies
Percy•4w ago
Project ID: f5925531-2de0-4da9-8a6d-15b5ba712ebe
rickitan•4w ago
Joe Lanman
Joe Lanman•4w ago
I'm getting this too
rickitan•4w ago
P.S i'm in the Pro plan
André•4w ago
Same Also after redeploy, app is suddenly not reachable anymore 😦
Joe Lanman
Joe Lanman•4w ago
not even an error page, just no server response at all
rickitan•4w ago
Yeah there's definitely an incidence happening. Hope the railway team sees this soon and starts taking action.
André•4w ago
Yep I hope so too ^^
No description
André•4w ago
But it's very weird as it only affects my prod and not the testing environment
Brody•4w ago
team has been made aware
Joe Lanman
Joe Lanman•4w ago
If it helps the investigation, my uptime monitor fired at 9am gmt
JustJake•4w ago
Ack Looking into it
Joe Lanman
Joe Lanman•4w ago
good luck!
Duchess•4w ago
New reply sent from Help Station thread:
Only 1 out of 8 applications seems to be down for me
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Mine too! There is something going on!
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Same issue with my node server. As with my mysql databases.All projects are affected.Can't connect to any node-app or mysql database on any project hosted on railway.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody•4w ago
rickitan•4w ago
thank you Brody
Duchess•4w ago
New reply sent from Help Station thread:
Can't redeploy either, but that's of less importance.
You're seeing this because this thread has been automatically linked to the Help Station thread.
kevin•4w ago
Adding to the data, our app is down too, seems like it’s Postgres given that landing page is fine, just main requests from DB are 500ing
Duchess•4w ago
New reply sent from Help Station thread:
How long is the expected recovery time?
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody•4w ago
the team have not posted an ETA yet
Duchess•4w ago
New reply sent from Help Station thread:
I think this is a catastrophic accident, and I hope there will be an official announcement afterwards. Our company's business has been severely affected.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
I have to agree. We have thousands of users that can't access our services for a longer period...
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody•4w ago
They will be publishing a post mortem after the incident has been resolved, either me or someone else will link it here when it is posted
MithushanJ•4w ago
hey @Brody , would upgrading to pro right now help ? Or will it be the same scenario
JustJake•4w ago
Majority of instances should be restored at this point
Giannis G.
Giannis G.•4w ago
Experiencing the same issues mentioned above Some of the projects run fine, others stopped working
Brody•4w ago
Pro users have also been affected
Joe Lanman
Joe Lanman•4w ago
I'm still down
Duchess•4w ago
New reply sent from Help Station thread:
In our case, the projects with static data are fine. All the websites with dynamic data from DB are down.
You're seeing this because this thread has been automatically linked to the Help Station thread.
MithushanJ•4w ago
I guess the Databases are down.
Giannis G.
Giannis G.•4w ago
No description
Brody•4w ago
this incident can affect all services not just databases
Duchess•4w ago
New reply sent from Help Station thread:
Nothing restored yet in our case.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody•4w ago
you may have to redeploy affected services
Giannis G.
Giannis G.•4w ago
I get this error when I do that
No description
Joe Lanman
Joe Lanman•4w ago
I'm getting the screenshot above too, does that mean no deployments?
King Jahad
King Jahad•4w ago
It means priority deployments first
Joe Lanman
Joe Lanman•4w ago
in the post mortem it would be good to look into why this was first raised on the forum and not via automated monitoring
Brody•4w ago
Yes, Pro users would have builder priority at this time
rickitan•4w ago
I'm in Pro plan, tried redeploying but I'm stucked at this:
No description
Brody•4w ago
please note that Jake said a majority, this issue has not been fully resolved yet
JustJake•4w ago
It was raised via automated monitoring But yes, sure
Joe Lanman
Joe Lanman•4w ago
? there was no incident when we raised it here, it started about 9am
Duchess•4w ago
New reply sent from Help Station thread:
Deployments not working either.
You're seeing this because this thread has been automatically linked to the Help Station thread.
RenderCoder•4w ago
Still unable to deploy services normally...
rickitan•4w ago
My app instance has been down all this time. But I was able to access my MySQL instance. Now I can't. I assume this is part of the restart.
Celengan Babi
Celengan Babi•4w ago
same here. my production site has been and it costs me and customers too 😦
King Jahad
King Jahad•4w ago
It is what it is. I owe someone who is not technically knowledgeable an explanation.
Celengan Babi
Celengan Babi•4w ago
hope it gets back up and running again
waltcow•4w ago
Limited Access - Disabling for hobby while we restore systems :HAHAHA:
CodeLover•4w ago
I also experience the same issue. Hope things get back to normal very soon
King Jahad
King Jahad•4w ago
Saying poor in diplomatic
Willem Sandoval
Willem Sandoval•4w ago
Same here...
Duchess•4w ago
New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.Just resolved itself without redeploy
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
This is the second time our app has been down for hours in a few months. I understand these things may happen, but it would be good to take responsibility and provide compensation to those financially affected by the issue (which is exactly what I do when something goes wrong in the business).In our case, this situation might cost us about €100 in losses at the moment, not to mention the time wasted and the impact on our SEO, which might cost a lot more in the long run.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Joe Lanman
Joe Lanman•4w ago
I'm back online
Duchess•4w ago
New reply sent from Help Station thread:
Howwwwwwww mannn
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
MyProject still unable to deploy services normally…
You're seeing this because this thread has been automatically linked to the Help Station thread.
CodeLover•4w ago
My website is still down. But now I get a différent message. It has to do with the database. I can't access the database. I get an error I get application failed to respond error On the client
André•4w ago
Systems partially work again. Live system running, Test system still no database.
rickitan•4w ago
@André did you have to redeploy or restart?
André•4w ago
@rickitan For Live to work again I redeployed every ~5 minutes to check. Now it works again. For stage I did nothing yet but the mysql server for laravel crashed. Redeploys / Restarts not woroking for stage yet Now all systems are back to normal (My systems)
rickitan•4w ago
Keziah•4w ago
They are not.
CodeLover•4w ago
Yes I was asked to restart but deployment still crashes
Arthur Macêdo
Arthur Macêdo•4w ago
Same thing here
King Jahad
King Jahad•4w ago
I think my API are back to normal now
mattey•4w ago
I've done a re-deployment, and back online, thanks, team.
King Jahad
King Jahad•4w ago
2+ hours is not a good thing though, it's going to hurt.
André•4w ago
@King Jahad Yes but I expect an report to be published hopefully today / tomorrow. It's one of the most awful thing that can happen. Let's give them some time to investigate and for every operation / downtime costs on client side there is time to discuss after. The most important thing is to get everything up and running again 😉
mattey•4w ago
King Jahad
King Jahad•4w ago
+1 I am not mad, just don't know who to blame for this. and how to make a client understand
zgmg92•4w ago
In case anyone is still experiencing issues, make sure you try to restart / redeploy, hobby or pro
Keziah•4w ago
I've tried dozens of times. Still down.
zgmg92•4w ago
This is the first time it worked for me, shouldn't long now if I had to guess I couldn't connect to Postgres for the longest time and it finally came online I had issues with both a hobby and pro service
CodeLover•4w ago
I can't even access the database from the ui
zgmg92•4w ago
Try force refresh (shift-command-r)
Keziah•4w ago
Same thing.
Arnór•4w ago
i'm not able to see the list of deployments, so i can't redeploy what i can't see
Duchess•4w ago
New reply sent from Help Station thread:
personally - tried re-deploying multiple times, last 3 min ago, didnt help. Deployments are successful, but no logs are displayed, and server is offline
You're seeing this because this thread has been automatically linked to the Help Station thread.
rickitan•4w ago
I'm back online
Duchess•4w ago
New reply sent from Help Station thread:
In our case, deployments are successfull, and crash after few seconds/minutes
You're seeing this because this thread has been automatically linked to the Help Station thread.
CodeLover•4w ago
I tried to redeploy again and it works now Thank you very much for your suggestions
Arnór•4w ago
i'm seeing the deployments.. so i'm redeploying now
Sang Dang
Sang Dang•4w ago
My app still can not redeploy because the connection to Postgres DB still failed 😦
rickitan•4w ago
Give it time, same was happening to me. I believe they are doing a massive restart. So some servers are restarted and come online before others. for (server in servers) { await server.restart() } Something like that
Arnór•4w ago
redeploying my app failed, but it just gives me "no build logs found for deployment"
_mati•4w ago
after several attempts, my server was succesfully redeployed. it wasn't able to connect to an internal Redis
RenderCoder•4w ago
The redeployment was successful, I almost lost my job today. :HAHAHA:
Sang Dang
Sang Dang•4w ago
I setup demo for my team today and Railway failed just 10mins right before the meeting. Nothing more embarrassed for me than this.
Arnór•4w ago
ironically my dev environment is working fine
Duchess•4w ago
New reply sent from Help Station thread:
thanks I redeploy and now its works for me
You're seeing this because this thread has been automatically linked to the Help Station thread.
Keziah•4w ago
We're back online too
Arnór•4w ago
latest deployment seems to have stopped in the middle of it.. never got to the health check + not showing any deployment log, only build log
André•4w ago
It was the same for me ^^ At least the "least" important project didn't worked. Suuuureeee 😂
Arnór•4w ago
probably time to move to aws 😭 redeploying 3rd time's the charm i am on beta V2 of the builder, and V2 runtime.. hopefully that is not biting me in the ass not able to reach the database right now during the deployment i'm seeing the postgres service, but when i click on data it can't establish a database connection (update: it popped in after 2-3 minutes) not seeing my app's deployments again (nm, seeing them)
Kimitri•4w ago
same to me, database it's working, but the app it self doesn't work
rickitan•4w ago
If I moved to AWS I would probably cause worse downtimes than trusting the railway team lol. It's just not my expertise. But this 2h long one was definitely a bad one.
dwaynemac•4w ago
the same happened to me on the previous massive outage 😫
Duchess•4w ago
New reply sent from Help Station thread:
aThis incident has been resolved.Once again, we apologize for the downtime. We'll be publishing a post-mortem of this incident soon.
You're seeing this because this thread has been automatically linked to the Help Station thread.
kevin•4w ago
Definitely looking forward to the post-mortem on how this can be prevented in the future. We have enterprise clients, and not sure how they can trust us when there’s a complete app outage today and last December. It is frustrating for us.
angelo•4w ago
Hey there Kevin, I don't want to reveal too much about your customer data in a semi-public forum, but I am pretty sure that I speak behalf on the Railway team on how sorry we are that you had end user impact to your workloads. We have a number of mitigations planned for the Infra side, but I can speak personally to how we change how we make it easier for those to immediately get in touch with our Infra team when issues arise. I just sent your company Slack invites so we can continue the conversation there. This is also a standing offer for anyone else impacted this way as well, we are working with all affected companies to deliver the post mortem and work on next steps.
pikachu•4w ago
@angelo thanks! Slack connect would be useful, especially since that's our main workspace. I don't see the slack invites, can you resend? DMing you my email
JustJake•4w ago
Here's the retro/post mortem It's up on the forums. Happy to discuss anything here or there https://help.railway.app/questions/incident-response-june-11th-2024-733fbd5d
Railway Help Station
Incident Response - June 11th 2024
This thread serves to aggregate discussion for the incident on June 11thThe full response can be found at https://blog.railway.app/p/2024-06-11-incident-reportRailway takes these incidents very, very seriously. Internally we've been working on infrastructure improvements which will make the platform faster and prevent outages like this from happ...