API Crashes 20+ Times Daily
Trying to figure out what is going on. Our uptime monitor sends out an email when the page goes down for longer than 5 min. For the past few weeks, the API is showing that it goes down 20-40 times a day for about a 3 day stretch. Then it's fine for a few days, and it's back to a 3 days stretch of chaos.
What?
How?
Why?
26 Replies
please provide more information, for starters, are there any error logs?
There are plenty of info logs, but they don't have any details on them that indicate an issue. There are no error, debug or warn logs for the past 2 months.
I would recommend adding some very verbose debug logging sonyoi can determine at what point your code crashes
It's built on Nest.js and the error/warning logs are typically pretty solid. I can look at adding something else, but that may take a bit.
railway isn't going to have the observability into your app if your app doesn't have the observability you need to determine the issue
In English, you're saying that the logs in Railway are only as good as the ones built into the app.
that's correct
if you don't know why your app is crashing, railway isn't going to know either
besides things like OOM but that's easy enough to determine from your side
That's the problem I believe. We're using Nest instead of Express in part because of the built-in logs. If the app actually crashes, Nest will let you know. But I'm not seeing any logs that say that the app actually crashed.
If the app never actually crashed, then the app has a problem with 1 page going haywire, or the response time is too long.
The uptime monitor is showing that the API went down 7 times on 2024-06-21 for an estimated total 7 minutes.
It doesn't actually tell me why, but this is a "keyword found" type monitor.
This monitor is an HTTP ping. It shows nothing happening on that date.
I'm sure there's a hundred or more ways your app could crash or soft lock without nest knowing.
are you on the v2 runtime? and on the new edge proxy?
I suspect that is probably true, though the how that can happen seem lost on me. I'm guessing no on v2 and edge, as I don't know what those are.
check your service settings
My Railway settings say Legacy runtime.
and the edge proxy?
Not enabled.
does your service have a volume?
I don't think so. I can't find anything that says Volume in the settings.
it's not in the settings, look at the project canvas
I'm guessing this is the canvas. If so, then it's just a Github repo and PostgreSQL DB.
there's no volume on the API service
you can see the postgres service has a volume
enable the v2 runtime and edge proxy on your API service
I see what you're talking about. The bottom box.
Deploying the updates now.
That's done.
okay continue monitoring the service and report back
Copy.
i've also been having an issue with my nestjs API restarting sporatically throughout the day. Havent had time to look into it though, i believe i used one of the existing railway templates. Did you also use the template @jared.leddy?
No, we didn't do anything fancy. Just connect the repo and quick deploy it with ENVs and a DB.