Railway•7mo ago

Outage?

Are you experiencing any issues? We're in Singapore. Just checking the public channel since private support aren't responding. Builds aren't working and 5 different environments are down 3c08e827-8d73-4a37-bbe9-9af9757bd354

Solution:

New reply sent from Help Station thread:

Fix implemented. Resolved.

You're seeing this because this thread has been automatically linked to the Help Station thread....

Jump to solution

25 Replies

Percy•7mo ago

Project ID: 3c08e827-8d73-4a37-bbe9-9af9757bd354

raleng•7mo ago

We have a service down as well in Singapore.

nickmacavoyOP•7mo ago

Maybe this? https://status.cloud.google.com/incidents/8xe5wtseE3Wc5PoMb7Re#gtQEJkfLUBhaJqDUrYio

nickmacavoyOP•7mo ago

Sad state of affairs on our production infrastructure

Duchess•7mo ago

New reply sent from Help Station thread:

Same here - nothing is responding at the moment

You're seeing this because this thread has been automatically linked to the Help Station thread.

nickmacavoyOP•7mo ago

Ping

Brody•7mo ago

please check #🚨｜incidents for updates

nickmacavoyOP•7mo ago

Thanks Brody! I will now that there's one there no available stackers found within resource limits on an attempted redeploy

nickmacavoyOP•7mo ago

https://railway.app/project/3c08e827-8d73-4a37-bbe9-9af9757bd354/service/959337ac-60f8-4004-9151-1b122a5e1460?id=bf4e176d-ec04-4def-92b5-7d3fe090d760

Duchess•7mo ago

New reply sent from Help Station thread:

Hi Nick please standby we are investigating, incident has been called

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Thanks david, adding some context where I have it in case it helps debugging

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

We came back online ~30 mins ago. Now we're back offline as of ~4 mins ago

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Our apps and services are still down as well, tried migrating to US region, no luck.

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Pls help, I can't connect to postgres db any more

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Still down for us too.

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Do you have backup, I'm thinking of migrate database to other provider

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Don't be too hasty – this should be resolved soon (given how long it took last time) though I'm not aware of your requirements. At a certain point that'd have to be an option but for us we won't as yet.

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Starting to see our services up now...

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Thanks partbot, trying to redeploy but no luck as yet. I'll also check in when we're up

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Update: Partial recovery, 50% of capacity restored. Actively working on the rest. Thanks for your patience, on-call team working as swiftly as possible to restore service.

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

thanks david

You're seeing this because this thread has been automatically linked to the Help Station thread.

jtechbit•7mo ago

ETA on full capacity restoration?

Duchess•7mo ago

New reply sent from Help Station thread:

Thanks David and team

You're seeing this because this thread has been automatically linked to the Help Station thread.

jtechbit•7mo ago

Time for another update? Just a reminder that people have production infrastructure that is affected.

RenderCoder•7mo ago

I just deployed services in the Singapore region and encountered a similar issue. Unable to deploy service successfully

Duchess•7mo ago

New reply sent from Help Station thread:

Still down, i'm trying regularly to re-deploy to no avail

You're seeing this because this thread has been automatically linked to the Help Station thread.

jtechbit•7mo ago

The level of communication from Railway on this incident is totally unacceptable. I hope processes can be improved as a result of the post-mortem. Even just a “we are continuing to work on it” would give some confidence an on-call team is actually working on this…

nickmacavoyOP•7mo ago

My production systems have been down 4 hours in this downtime, and in total 6 hours 15 mins today. So far

Duchess•7mo ago

New reply sent from Help Station thread:

Update: The core issue has been identified and a resolution is in progress to restore service. The on-call team is working to roll it out.

You're seeing this because this thread has been automatically linked to the Help Station thread.

nickmacavoyOP•7mo ago

I'm online now. Redeploying worked

Duchess•7mo ago

New reply sent from Help Station thread:

4 out of my 5 services redeployed properly. One more still haven't recovered. Might take a while more for the fix to be rolled out

You're seeing this because this thread has been automatically linked to the Help Station thread.

nickmacavoyOP•7mo ago

Almost 11pm here, going to be a nervous night's sleep given the day of issues. Thanks for getting it resolved team. Echoing jtechbit – not enough comms given the severity

Duchess•7mo ago

New reply sent from Help Station thread:

Thanks for the feedback, acknowledged. That's on me personally for not communicating more. We've had the full on-call team on this (with several additional engineers joining) for as many hours as service has been down.

You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:

Full service restoration in sight.

You're seeing this because this thread has been automatically linked to the Help Station thread.

Solution

Duchess•7mo ago

New reply sent from Help Station thread:

Fix implemented. Resolved.

You're seeing this because this thread has been automatically linked to the Help Station thread.

jtechbit•7mo ago

Thank you for the update David! My services are now responding normally.

Duchess•7mo ago

New reply sent from Help Station thread:

We've published a full incident retro here: https://blog.railway.app/p/2024-05-04-incident-report

You're seeing this because this thread has been automatically linked to the Help Station thread.

Railway Blog

Incident Report: May 4th, 2024

We recently experienced an outage on our platform that partially affected our Asia-Southeast compute infrastructure and caused workloads to be unreachable. When production outages occur, it is Railway’s policy to share the public details of what occurred.

Gaming

Programming

Outage?