R
Railway•6mo ago
rickitan

My App is Down, no logs for past 15 minutes Restart didn't work

Hey Railway team, My production app is down. No logs on the server. It's definitely on the Railway side,
104 Replies
Percy
Percy•6mo ago
Project ID: f5925531-2de0-4da9-8a6d-15b5ba712ebe
rickitan
rickitanOP•6mo ago
f5925531-2de0-4da9-8a6d-15b5ba712ebe
Joe Lanman
Joe Lanman•6mo ago
I'm getting this too
rickitan
rickitanOP•6mo ago
P.S i'm in the Pro plan
André
André•6mo ago
Same Also after redeploy, app is suddenly not reachable anymore 😦
Joe Lanman
Joe Lanman•6mo ago
not even an error page, just no server response at all
rickitan
rickitanOP•6mo ago
Yeah there's definitely an incidence happening. Hope the railway team sees this soon and starts taking action.
André
André•6mo ago
Yep I hope so too ^^
No description
André
André•6mo ago
But it's very weird as it only affects my prod and not the testing environment
Brody
Brody•6mo ago
team has been made aware
Joe Lanman
Joe Lanman•6mo ago
If it helps the investigation, my uptime monitor fired at 9am gmt
JustJake
JustJake•6mo ago
Ack Looking into it
Joe Lanman
Joe Lanman•6mo ago
good luck!
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
Only 1 out of 8 applications seems to be down for me
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Mine too! There is something going on!
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Same issue with my node server. As with my mysql databases.All projects are affected.Can't connect to any node-app or mysql database on any project hosted on railway.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•6mo ago
rickitan
rickitanOP•6mo ago
thank you Brody
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
Can't redeploy either, but that's of less importance.
You're seeing this because this thread has been automatically linked to the Help Station thread.
kevin
kevin•6mo ago
Adding to the data, our app is down too, seems like it’s Postgres given that landing page is fine, just main requests from DB are 500ing
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
How long is the expected recovery time?
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•6mo ago
the team have not posted an ETA yet
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
I think this is a catastrophic accident, and I hope there will be an official announcement afterwards. Our company's business has been severely affected.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
I have to agree. We have thousands of users that can't access our services for a longer period...
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•6mo ago
They will be publishing a post mortem after the incident has been resolved, either me or someone else will link it here when it is posted
MithushanJ
MithushanJ•6mo ago
hey @Brody , would upgrading to pro right now help ? Or will it be the same scenario
JustJake
JustJake•6mo ago
Majority of instances should be restored at this point
GG
GG•6mo ago
Experiencing the same issues mentioned above Some of the projects run fine, others stopped working
Brody
Brody•6mo ago
Pro users have also been affected
Joe Lanman
Joe Lanman•6mo ago
I'm still down
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
In our case, the projects with static data are fine. All the websites with dynamic data from DB are down.
You're seeing this because this thread has been automatically linked to the Help Station thread.
MithushanJ
MithushanJ•6mo ago
I guess the Databases are down.
GG
GG•6mo ago
No description
Brody
Brody•6mo ago
this incident can affect all services not just databases
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
Nothing restored yet in our case.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Brody
Brody•6mo ago
you may have to redeploy affected services
GG
GG•6mo ago
I get this error when I do that
No description
Joe Lanman
Joe Lanman•6mo ago
I'm getting the screenshot above too, does that mean no deployments?
King Jahad
King Jahad•6mo ago
It means priority deployments first
Joe Lanman
Joe Lanman•6mo ago
in the post mortem it would be good to look into why this was first raised on the forum and not via automated monitoring
Brody
Brody•6mo ago
Yes, Pro users would have builder priority at this time
rickitan
rickitanOP•6mo ago
I'm in Pro plan, tried redeploying but I'm stucked at this:
No description
Brody
Brody•6mo ago
please note that Jake said a majority, this issue has not been fully resolved yet
JustJake
JustJake•6mo ago
It was raised via automated monitoring But yes, sure
Joe Lanman
Joe Lanman•6mo ago
? there was no incident when we raised it here, it started about 9am
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
Deployments not working either.
You're seeing this because this thread has been automatically linked to the Help Station thread.
RenderCoder
RenderCoder•6mo ago
Still unable to deploy services normally...
rickitan
rickitanOP•6mo ago
My app instance has been down all this time. But I was able to access my MySQL instance. Now I can't. I assume this is part of the restart.
Celengan Babi
Celengan Babi•6mo ago
same here. my production site has been and it costs me and customers too 😦
King Jahad
King Jahad•6mo ago
It is what it is. I owe someone who is not technically knowledgeable an explanation.
Celengan Babi
Celengan Babi•6mo ago
hope it gets back up and running again
waltcow
waltcow•6mo ago
Limited Access - Disabling for hobby while we restore systems :HAHAHA:
CodeLover
CodeLover•6mo ago
I also experience the same issue. Hope things get back to normal very soon
King Jahad
King Jahad•6mo ago
Saying poor in diplomatic
Willem Sandoval
Willem Sandoval•6mo ago
Same here...
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Node-server gets some HTTP requests, but not all. Cannot connect to database hosted outside on planetscale either.Just resolved itself without redeploy
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
This is the second time our app has been down for hours in a few months. I understand these things may happen, but it would be good to take responsibility and provide compensation to those financially affected by the issue (which is exactly what I do when something goes wrong in the business).In our case, this situation might cost us about €100 in losses at the moment, not to mention the time wasted and the impact on our SEO, which might cost a lot more in the long run.
You're seeing this because this thread has been automatically linked to the Help Station thread.
Joe Lanman
Joe Lanman•6mo ago
I'm back online
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
Howwwwwwww mannn
You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
MyProject still unable to deploy services normally…
You're seeing this because this thread has been automatically linked to the Help Station thread.
CodeLover
CodeLover•6mo ago
My website is still down. But now I get a différent message. It has to do with the database. I can't access the database. I get an error I get application failed to respond error On the client
André
André•6mo ago
Systems partially work again. Live system running, Test system still no database.
rickitan
rickitanOP•6mo ago
@André did you have to redeploy or restart?
André
André•6mo ago
@rickitan For Live to work again I redeployed every ~5 minutes to check. Now it works again. For stage I did nothing yet but the mysql server for laravel crashed. Redeploys / Restarts not woroking for stage yet Now all systems are back to normal (My systems)
rickitan
rickitanOP•6mo ago
Awesome!!
Keziah
Keziah•6mo ago
They are not.
CodeLover
CodeLover•6mo ago
Yes I was asked to restart but deployment still crashes
Arthur Macêdo
Arthur Macêdo•6mo ago
Same thing here
King Jahad
King Jahad•6mo ago
I think my API are back to normal now
mattey
mattey•6mo ago
I've done a re-deployment, and back online, thanks, team.
King Jahad
King Jahad•6mo ago
2+ hours is not a good thing though, it's going to hurt.
André
André•6mo ago
@King Jahad Yes but I expect an report to be published hopefully today / tomorrow. It's one of the most awful thing that can happen. Let's give them some time to investigate and for every operation / downtime costs on client side there is time to discuss after. The most important thing is to get everything up and running again 😉
mattey
mattey•6mo ago
:blobyes:
King Jahad
King Jahad•6mo ago
+1 I am not mad, just don't know who to blame for this. and how to make a client understand
cybershizo
cybershizo•6mo ago
In case anyone is still experiencing issues, make sure you try to restart / redeploy, hobby or pro
Keziah
Keziah•6mo ago
I've tried dozens of times. Still down.
cybershizo
cybershizo•6mo ago
This is the first time it worked for me, shouldn't long now if I had to guess I couldn't connect to Postgres for the longest time and it finally came online I had issues with both a hobby and pro service
CodeLover
CodeLover•6mo ago
I can't even access the database from the ui
cybershizo
cybershizo•6mo ago
Try force refresh (shift-command-r)
Keziah
Keziah•6mo ago
Same thing.
Arnór
Arnór•6mo ago
i'm not able to see the list of deployments, so i can't redeploy what i can't see
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
personally - tried re-deploying multiple times, last 3 min ago, didnt help. Deployments are successful, but no logs are displayed, and server is offline
You're seeing this because this thread has been automatically linked to the Help Station thread.
rickitan
rickitanOP•6mo ago
I'm back online
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
In our case, deployments are successfull, and crash after few seconds/minutes
You're seeing this because this thread has been automatically linked to the Help Station thread.
CodeLover
CodeLover•6mo ago
I tried to redeploy again and it works now Thank you very much for your suggestions
Arnór
Arnór•6mo ago
i'm seeing the deployments.. so i'm redeploying now
Sang Dang
Sang Dang•6mo ago
My app still can not redeploy because the connection to Postgres DB still failed 😦
rickitan
rickitanOP•6mo ago
Give it time, same was happening to me. I believe they are doing a massive restart. So some servers are restarted and come online before others. for (server in servers) { await server.restart() } Something like that
Arnór
Arnór•6mo ago
redeploying my app failed, but it just gives me "no build logs found for deployment"
_mati
_mati•6mo ago
after several attempts, my server was succesfully redeployed. it wasn't able to connect to an internal Redis
RenderCoder
RenderCoder•6mo ago
The redeployment was successful, I almost lost my job today. :HAHAHA:
Sang Dang
Sang Dang•6mo ago
I setup demo for my team today and Railway failed just 10mins right before the meeting. Nothing more embarrassed for me than this.
Arnór
Arnór•6mo ago
ironically my dev environment is working fine
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
thanks I redeploy and now its works for me
You're seeing this because this thread has been automatically linked to the Help Station thread.
Keziah
Keziah•6mo ago
We're back online too
Arnór
Arnór•6mo ago
latest deployment seems to have stopped in the middle of it.. never got to the health check + not showing any deployment log, only build log
André
André•6mo ago
It was the same for me ^^ At least the "least" important project didn't worked. Suuuureeee 😂
Arnór
Arnór•6mo ago
probably time to move to aws 😭 redeploying 3rd time's the charm i am on beta V2 of the builder, and V2 runtime.. hopefully that is not biting me in the ass not able to reach the database right now during the deployment i'm seeing the postgres service, but when i click on data it can't establish a database connection (update: it popped in after 2-3 minutes) not seeing my app's deployments again (nm, seeing them)
Kimitri
Kimitri•6mo ago
same to me, database it's working, but the app it self doesn't work
rickitan
rickitanOP•6mo ago
If I moved to AWS I would probably cause worse downtimes than trusting the railway team lol. It's just not my expertise. But this 2h long one was definitely a bad one.
dwaynemac
dwaynemac•6mo ago
the same happened to me on the previous massive outage 😫
Duchess
Duchess•6mo ago
New reply sent from Help Station thread:
aThis incident has been resolved.Once again, we apologize for the downtime. We'll be publishing a post-mortem of this incident soon.
You're seeing this because this thread has been automatically linked to the Help Station thread.
kevin
kevin•6mo ago
Definitely looking forward to the post-mortem on how this can be prevented in the future. We have enterprise clients, and not sure how they can trust us when there’s a complete app outage today and last December. It is frustrating for us.
angelo
angelo•6mo ago
Hey there Kevin, I don't want to reveal too much about your customer data in a semi-public forum, but I am pretty sure that I speak behalf on the Railway team on how sorry we are that you had end user impact to your workloads. We have a number of mitigations planned for the Infra side, but I can speak personally to how we change how we make it easier for those to immediately get in touch with our Infra team when issues arise. I just sent your company Slack invites so we can continue the conversation there. This is also a standing offer for anyone else impacted this way as well, we are working with all affected companies to deliver the post mortem and work on next steps.
pikachu
pikachu•6mo ago
@angelo thanks! Slack connect would be useful, especially since that's our main workspace. I don't see the slack invites, can you resend? DMing you my email
JustJake
JustJake•6mo ago
Here's the retro/post mortem It's up on the forums. Happy to discuss anything here or there https://help.railway.app/questions/incident-response-june-11th-2024-733fbd5d
Railway Help Station
Incident Response - June 11th 2024
This thread serves to aggregate discussion for the incident on June 11thThe full response can be found at https://blog.railway.app/p/2024-06-11-incident-reportRailway takes these incidents very, very seriously. Internally we've been working on infrastructure improvements which will make the platform faster and prevent outages like this from happ...
Sneep3r4476
Sneep3r4476•3mo ago
I am seeing the same issue again, maybe related to the outage on the 8/27? it keeps occuring for us after the outage last experience it on 8/29
Matheus Faustino
Matheus Faustino•2mo ago
I got the same problem on October 12, 2024. Are we experiencing the same problem?
No description
Brody
Brody•2mo ago
I don't see how that is related, looks like you are opening too many connections
Want results from more Discord servers?
Add your server