Restart doesn't actually restart
Seems like a service failed after it couldn't connect to a DB... i tried to restart but it never restarted. This has been an ongoing issue for a few weeks
37 Replies
Project ID:
97046871-517d-4af1-adfa-6b493cccebc3
97046871-517d-4af1-adfa-6b493cccebc3
usually just get around this issue by redeploying but my project takes 10-15min to build so sometimes an annoyance
I'm seeing a deployment above your crashed deployment. Looks to me like your restart was successful
the new deployment was successful yes, but i dont believe that failed container was ever restarted. i can try again sometime later and show if necessary
but from the screenshots you can see it says "restart successful" but on the ui it still shows a red box. no new blue box saying its restarting ever popped up -- had to manually deploy since restart didnt work
running into the same issue again
Hm very odd. Is your app active? Another user reported a similar issue where their app was in the crashed status visually but was still logging
yes -- i guess this now comes to semantics on what does restart/redeploy mean... i feel like i should be able to restart a running container and not have to redeploy(build and push that image) just to restart that service
hey guys this is a pretty serious issue, our build times are unfortunately very long (20 min tops) and it takes up 20 minutes just to get back "online"
Why are you restarting your service that often? On code updates you should have a deployment running with previous code that’s shut down when your new code’s healthcheck is complete
this seems like user error
I have 100k+ users a day so it crashes our database almost every 12 -18 hours. this crashes this particular instance so it shows up as "crashed"
it could be user error but i would like to just simply restart the container. meaning: delete it, run the same exact image w/ same config, and have it back up
this definitely sounds like user error. There’s got to be better ways to get around that. Also, with 100k+ users you should be on the teams plan
this is not a hobby project as the dev plan is meant for
not to mention I have other services on railway that simply hand and show up as "application not responding" would be nice to have healthchecks running hourly if thats possible?
alright sounds good. i use "we" too often, sorry its just me self funding.
Unfortunately that all sounds like user/code error. Afaik there’s no way to set up scheduled healthchecks, but if you join the teams plan you can discuss that with the team
Hey @sdan - this is bug on our end.
With that said- is your app crashing or the DB crashing?
db running on google cloud, i found railway cant handle some stuff so moved most of my infra elsewhere
Like vector or?
Just a scale issue
yea
yea to what
😛
yea vector db and yea scale issue 🙂
L
also have google cloud credits
ok- so on your app, how many connections to the DB are you keeping open?
8 at a time probably
What happens when you bump that up?
no clue honestly i just restart stuff whenever it goes down
;-;
there are more issues because the vector db i am using is in beta and runs into race issues all the time
so, you may wanna increase the number of connections
actually wait
can you decrease it?
it will slow your app but might help with race
also do you have a link to that vector DB?
yeah i have tried multiple things but ultimately i dont run most of my heavy workloads on railway. i just purely do reading on railway
the AI-native open-source embedding database
the AI-native open-source embedding database
I know a guy there, we can chat
and i have probably already chatted with that guy haha. theyre rolling out a refactor next week so hoping that will solve it
curious, why are you still on Railway then (aside from you being an ex-employee)
what are we doing so right even when we seem to get things wrong
no easy way to run flask servers honestly
i do vercel for 99% of stuff but now need to interact with python and vercel is pretty bad at it
you mean that Google Cloud Run's 99 steps isn't easy 😉
anyway, gotcha- can you dump crash logs when the DB connects reset?
I would have a service that uses the Railway API and monitors when DB crashes and just perform a restart ngl
in the long term, I am going to flag the UI bug to the team
google cloud is a mess for sure but its containable mess :). just docker up, docker down, docker remove, docker ps -a. and tailscale for networking and cloudflare for proxying.
i have reliable logs, stuff never hangs, and if it does i know exactly whats up. i can check htop, etc.
railway hangs and logs stop and stuff gets silently shut off. more often than not i wake up to a text from someone saying my stuff is down and railway still shows a green box which is frustrating.
railway api monitoring a db that is not running on railway is def. not railway's fault. its just reliable loggin and make sure that if something crashes that it is fully crashes. i think i turned off notifs for crashes which i will turn back on
also as prev. mentioned, having continuous health checks would be nice
some logs
again this is entirely my error -- the db crashing should be handled on my end.