R
Railway•4w ago
rickitan

My App is Down, no logs for past 15 minutes Restart didn't work

Hey Railway team, My production app is down. No logs on the server. It's definitely on the Railway side,
101 Replies
Percy
Percy•4w ago
Project ID: f5925531-2de0-4da9-8a6d-15b5ba712ebe
rickitan
rickitan•4w ago
f5925531-2de0-4da9-8a6d-15b5ba712ebe
Joe Lanman
Joe Lanman•4w ago
I'm getting this too
rickitan
rickitan•4w ago
P.S i'm in the Pro plan
André
André•4w ago
Same Also after redeploy, app is suddenly not reachable anymore 😦
Joe Lanman
Joe Lanman•4w ago
not even an error page, just no server response at all
rickitan
rickitan•4w ago
Yeah there's definitely an incidence happening. Hope the railway team sees this soon and starts taking action.
André
André•4w ago
Yep I hope so too ^^
No description
André
André•4w ago
But it's very weird as it only affects my prod and not the testing environment
Brody
Brody•4w ago
team has been made aware
Joe Lanman
Joe Lanman•4w ago
If it helps the investigation, my uptime monitor fired at 9am gmt
JustJake
JustJake•4w ago
Ack Looking into it
Joe Lanman
Joe Lanman•4w ago
good luck!
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
Only 1 out of 8 applications seems to be down for me
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Mine too! There is something going on!
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Same issue with my node server. As with my mysql databases.All projects are affected.Can't connect to any node-app or mysql database on any project hosted on railway.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•4w ago
rickitan
rickitan•4w ago
thank you Brody
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
Can't redeploy either, but that's of less importance.
You're seeing this because this thread has been automatically linked to the Help Station thread.
kevin
kevin•4w ago
Adding to the data, our app is down too, seems like it’s Postgres given that landing page is fine, just main requests from DB are 500ing
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
How long is the expected recovery time?
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•4w ago
the team have not posted an ETA yet
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
I think this is a catastrophic accident, and I hope there will be an official announcement afterwards. Our company's business has been severely affected.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
I have to agree. We have thousands of users that can't access our services for a longer period...
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•4w ago
They will be publishing a post mortem after the incident has been resolved, either me or someone else will link it here when it is posted
MithushanJ
MithushanJ•4w ago
hey @Brody , would upgrading to pro right now help ? Or will it be the same scenario
JustJake
JustJake•4w ago
Majority of instances should be restored at this point
Giannis G.
Giannis G.•4w ago
Experiencing the same issues mentioned above Some of the projects run fine, others stopped working
Brody
Brody•4w ago
Pro users have also been affected
Joe Lanman
Joe Lanman•4w ago
I'm still down
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
In our case, the projects with static data are fine. All the websites with dynamic data from DB are down.
You're seeing this because this thread has been automatically linked to the Help Station thread.
MithushanJ
MithushanJ•4w ago
I guess the Databases are down.
Giannis G.
Giannis G.•4w ago
No description
Brody
Brody•4w ago
this incident can affect all services not just databases
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
Nothing restored yet in our case.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•4w ago
you may have to redeploy affected services
Giannis G.
Giannis G.•4w ago
I get this error when I do that
No description
Joe Lanman
Joe Lanman•4w ago
I'm getting the screenshot above too, does that mean no deployments?
King Jahad
King Jahad•4w ago
It means priority deployments first
Joe Lanman
Joe Lanman•4w ago
in the post mortem it would be good to look into why this was first raised on the forum and not via automated monitoring
Brody
Brody•4w ago
Yes, Pro users would have builder priority at this time
rickitan
rickitan•4w ago
I'm in Pro plan, tried redeploying but I'm stucked at this:
No description
Brody
Brody•4w ago
please note that Jake said a majority, this issue has not been fully resolved yet
JustJake
JustJake•4w ago
It was raised via automated monitoring But yes, sure
Joe Lanman
Joe Lanman•4w ago
? there was no incident when we raised it here, it started about 9am
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
Deployments not working either.
You're seeing this because this thread has been automatically linked to the Help Station thread.
RenderCoder
RenderCoder•4w ago
Still unable to deploy services normally...
rickitan
rickitan•4w ago
My app instance has been down all this time. But I was able to access my MySQL instance. Now I can't. I assume this is part of the restart.
Celengan Babi
Celengan Babi•4w ago
same here. my production site has been and it costs me and customers too 😦
King Jahad
King Jahad•4w ago
It is what it is. I owe someone who is not technically knowledgeable an explanation.
Celengan Babi
Celengan Babi•4w ago
hope it gets back up and running again
waltcow
waltcow•4w ago
Limited Access - Disabling for hobby while we restore systems :HAHAHA:
CodeLover
CodeLover•4w ago
I also experience the same issue. Hope things get back to normal very soon
King Jahad
King Jahad•4w ago
Saying poor in diplomatic
Willem Sandoval
Willem Sandoval•4w ago
Same here...
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.Just resolved itself without redeploy
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
This is the second time our app has been down for hours in a few months. I understand these things may happen, but it would be good to take responsibility and provide compensation to those financially affected by the issue (which is exactly what I do when something goes wrong in the business).In our case, this situation might cost us about €100 in losses at the moment, not to mention the time wasted and the impact on our SEO, which might cost a lot more in the long run.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Joe Lanman
Joe Lanman•4w ago
I'm back online
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
Howwwwwwww mannn
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
MyProject still unable to deploy services normally…
You're seeing this because this thread has been automatically linked to the Help Station thread.
CodeLover
CodeLover•4w ago
My website is still down. But now I get a différent message. It has to do with the database. I can't access the database. I get an error I get application failed to respond error On the client
André
André•4w ago
Systems partially work again. Live system running, Test system still no database.
rickitan
rickitan•4w ago
@André did you have to redeploy or restart?
André
André•4w ago
@rickitan For Live to work again I redeployed every ~5 minutes to check. Now it works again. For stage I did nothing yet but the mysql server for laravel crashed. Redeploys / Restarts not woroking for stage yet Now all systems are back to normal (My systems)
rickitan
rickitan•4w ago
Awesome!!
Keziah
Keziah•4w ago
They are not.
CodeLover
CodeLover•4w ago
Yes I was asked to restart but deployment still crashes
Arthur Macêdo
Arthur Macêdo•4w ago
Same thing here
King Jahad
King Jahad•4w ago
I think my API are back to normal now
mattey
mattey•4w ago
I've done a re-deployment, and back online, thanks, team.
King Jahad
King Jahad•4w ago
2+ hours is not a good thing though, it's going to hurt.
André
André•4w ago
@King Jahad Yes but I expect an report to be published hopefully today / tomorrow. It's one of the most awful thing that can happen. Let's give them some time to investigate and for every operation / downtime costs on client side there is time to discuss after. The most important thing is to get everything up and running again 😉
mattey
mattey•4w ago
:blobyes:
King Jahad
King Jahad•4w ago
+1 I am not mad, just don't know who to blame for this. and how to make a client understand
zgmg92
zgmg92•4w ago
In case anyone is still experiencing issues, make sure you try to restart / redeploy, hobby or pro
Keziah
Keziah•4w ago
I've tried dozens of times. Still down.
zgmg92
zgmg92•4w ago
This is the first time it worked for me, shouldn't long now if I had to guess I couldn't connect to Postgres for the longest time and it finally came online I had issues with both a hobby and pro service
CodeLover
CodeLover•4w ago
I can't even access the database from the ui
zgmg92
zgmg92•4w ago
Try force refresh (shift-command-r)
Keziah
Keziah•4w ago
Same thing.
Arnór
Arnór•4w ago
i'm not able to see the list of deployments, so i can't redeploy what i can't see
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
personally - tried re-deploying multiple times, last 3 min ago, didnt help. Deployments are successful, but no logs are displayed, and server is offline
You're seeing this because this thread has been automatically linked to the Help Station thread.
rickitan
rickitan•4w ago
I'm back online
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
In our case, deployments are successfull, and crash after few seconds/minutes
You're seeing this because this thread has been automatically linked to the Help Station thread.
CodeLover
CodeLover•4w ago
I tried to redeploy again and it works now Thank you very much for your suggestions
Arnór
Arnór•4w ago
i'm seeing the deployments.. so i'm redeploying now
Sang Dang
Sang Dang•4w ago
My app still can not redeploy because the connection to Postgres DB still failed 😦
rickitan
rickitan•4w ago
Give it time, same was happening to me. I believe they are doing a massive restart. So some servers are restarted and come online before others. for (server in servers) { await server.restart() } Something like that
Arnór
Arnór•4w ago
redeploying my app failed, but it just gives me "no build logs found for deployment"
_mati
_mati•4w ago
after several attempts, my server was succesfully redeployed. it wasn't able to connect to an internal Redis
RenderCoder
RenderCoder•4w ago
The redeployment was successful, I almost lost my job today. :HAHAHA:
Sang Dang
Sang Dang•4w ago
I setup demo for my team today and Railway failed just 10mins right before the meeting. Nothing more embarrassed for me than this.
Arnór
Arnór•4w ago
ironically my dev environment is working fine
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
thanks I redeploy and now its works for me
You're seeing this because this thread has been automatically linked to the Help Station thread.
Keziah
Keziah•4w ago
We're back online too
Arnór
Arnór•4w ago
latest deployment seems to have stopped in the middle of it.. never got to the health check + not showing any deployment log, only build log
André
André•4w ago
It was the same for me ^^ At least the "least" important project didn't worked. Suuuureeee 😂
Arnór
Arnór•4w ago
probably time to move to aws 😭 redeploying 3rd time's the charm i am on beta V2 of the builder, and V2 runtime.. hopefully that is not biting me in the ass not able to reach the database right now during the deployment i'm seeing the postgres service, but when i click on data it can't establish a database connection (update: it popped in after 2-3 minutes) not seeing my app's deployments again (nm, seeing them)
Kimitri
Kimitri•4w ago
same to me, database it's working, but the app it self doesn't work
rickitan
rickitan•4w ago
If I moved to AWS I would probably cause worse downtimes than trusting the railway team lol. It's just not my expertise. But this 2h long one was definitely a bad one.
dwaynemac
dwaynemac•4w ago
the same happened to me on the previous massive outage 😫
Duchess
Duchess•4w ago
New reply sent from Help Station thread:
aThis incident has been resolved.Once again, we apologize for the downtime. We'll be publishing a post-mortem of this incident soon.
You're seeing this because this thread has been automatically linked to the Help Station thread.
kevin
kevin•4w ago
Definitely looking forward to the post-mortem on how this can be prevented in the future. We have enterprise clients, and not sure how they can trust us when there’s a complete app outage today and last December. It is frustrating for us.
angelo
angelo•4w ago
Hey there Kevin, I don't want to reveal too much about your customer data in a semi-public forum, but I am pretty sure that I speak behalf on the Railway team on how sorry we are that you had end user impact to your workloads. We have a number of mitigations planned for the Infra side, but I can speak personally to how we change how we make it easier for those to immediately get in touch with our Infra team when issues arise. I just sent your company Slack invites so we can continue the conversation there. This is also a standing offer for anyone else impacted this way as well, we are working with all affected companies to deliver the post mortem and work on next steps.
pikachu
pikachu•4w ago
@angelo thanks! Slack connect would be useful, especially since that's our main workspace. I don't see the slack invites, can you resend? DMing you my email
JustJake
JustJake•4w ago
Here's the retro/post mortem It's up on the forums. Happy to discuss anything here or there https://help.railway.app/questions/incident-response-june-11th-2024-733fbd5d
Railway Help Station
Incident Response - June 11th 2024
This thread serves to aggregate discussion for the incident on June 11thThe full response can be found at https://blog.railway.app/p/2024-06-11-incident-reportRailway takes these incidents very, very seriously. Internally we've been working on infrastructure improvements which will make the platform faster and prevent outages like this from happ...