Outage?
Are you experiencing any issues? We're in Singapore. Just checking the public channel since private support aren't responding. Builds aren't working and 5 different environments are down
3c08e827-8d73-4a37-bbe9-9af9757bd354
Solution:Jump to solution
New reply sent from Help Station thread:
Fix implemented. Resolved.You're seeing this because this thread has been automatically linked to the Help Station thread....
25 Replies
Project ID:
3c08e827-8d73-4a37-bbe9-9af9757bd354
We have a service down as well in Singapore.
Sad state of affairs on our production infrastructure
New reply sent from Help Station thread:
Same here - nothing is responding at the momentYou're seeing this because this thread has been automatically linked to the Help Station thread.
Ping
please check #🚨|incidents for updates
Thanks Brody! I will now that there's one there
no available stackers found within resource limits
on an attempted redeployNew reply sent from Help Station thread:
Hi Nick please standby we are investigating, incident has been calledYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Thanks david, adding some context where I have it in case it helps debuggingYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
We came back online ~30 mins ago. Now we're back offline as of ~4 mins agoYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Our apps and services are still down as well, tried migrating to US region, no luck.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Pls help, I can't connect to postgres db any moreYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Still down for us too.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Do you have backup, I'm thinking of migrate database to other providerYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Don't be too hasty – this should be resolved soon (given how long it took last time) though I'm not aware of your requirements. At a certain point that'd have to be an option but for us we won't as yet.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Starting to see our services up now...You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Thanks partbot, trying to redeploy but no luck as yet. I'll also check in when we're upYou're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Update: Partial recovery, 50% of capacity restored. Actively working on the rest. Thanks for your patience, on-call team working as swiftly as possible to restore service.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
thanks davidYou're seeing this because this thread has been automatically linked to the Help Station thread.
ETA on full capacity restoration?
New reply sent from Help Station thread:
Thanks David and teamYou're seeing this because this thread has been automatically linked to the Help Station thread.
Time for another update? Just a reminder that people have production infrastructure that is affected.
I just deployed services in the Singapore region and encountered a similar issue. Unable to deploy service successfully
New reply sent from Help Station thread:
Still down, i'm trying regularly to re-deploy to no availYou're seeing this because this thread has been automatically linked to the Help Station thread.
The level of communication from Railway on this incident is totally unacceptable. I hope processes can be improved as a result of the post-mortem. Even just a “we are continuing to work on it” would give some confidence an on-call team is actually working on this…
My production systems have been down 4 hours in this downtime, and in total 6 hours 15 mins today. So far
New reply sent from Help Station thread:
Update: The core issue has been identified and a resolution is in progress to restore service. The on-call team is working to roll it out.You're seeing this because this thread has been automatically linked to the Help Station thread.
I'm online now. Redeploying worked
New reply sent from Help Station thread:
4 out of my 5 services redeployed properly. One more still haven't recovered. Might take a while more for the fix to be rolled outYou're seeing this because this thread has been automatically linked to the Help Station thread.
Almost 11pm here, going to be a nervous night's sleep given the day of issues.
Thanks for getting it resolved team. Echoing jtechbit – not enough comms given the severity
New reply sent from Help Station thread:
Thanks for the feedback, acknowledged. That's on me personally for not communicating more. We've had the full on-call team on this (with several additional engineers joining) for as many hours as service has been down.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Full service restoration in sight.You're seeing this because this thread has been automatically linked to the Help Station thread.
Solution
New reply sent from Help Station thread:
Fix implemented. Resolved.You're seeing this because this thread has been automatically linked to the Help Station thread.
Thank you for the update David! My services are now responding normally.
New reply sent from Help Station thread:
We've published a full incident retro here: https://blog.railway.app/p/2024-05-04-incident-reportYou're seeing this because this thread has been automatically linked to the Help Station thread.
Railway Blog
Incident Report: May 4th, 2024
We recently experienced an outage on our platform that partially affected our Asia-Southeast compute infrastructure and caused workloads to be unreachable. When production outages occur, it is Railway’s policy to share the public details of what occurred.