API Crashes 20+ Times Daily

Trying to figure out what is going on. Our uptime monitor sends out an email when the page goes down for longer than 5 min. For the past few weeks, the API is showing that it goes down 20-40 times a day for about a 3 day stretch. Then it's fine for a few days, and it's back to a 3 days stretch of chaos. What? How? Why?
26 Replies
Brody
Brody2w ago
please provide more information, for starters, are there any error logs?
jared.leddy
jared.leddy2w ago
There are plenty of info logs, but they don't have any details on them that indicate an issue. There are no error, debug or warn logs for the past 2 months.
Brody
Brody2w ago
I would recommend adding some very verbose debug logging sonyoi can determine at what point your code crashes
jared.leddy
jared.leddy2w ago
It's built on Nest.js and the error/warning logs are typically pretty solid. I can look at adding something else, but that may take a bit.
Brody
Brody2w ago
railway isn't going to have the observability into your app if your app doesn't have the observability you need to determine the issue
jared.leddy
jared.leddy2w ago
In English, you're saying that the logs in Railway are only as good as the ones built into the app.
Brody
Brody2w ago
that's correct if you don't know why your app is crashing, railway isn't going to know either besides things like OOM but that's easy enough to determine from your side
jared.leddy
jared.leddy2w ago
That's the problem I believe. We're using Nest instead of Express in part because of the built-in logs. If the app actually crashes, Nest will let you know. But I'm not seeing any logs that say that the app actually crashed. If the app never actually crashed, then the app has a problem with 1 page going haywire, or the response time is too long. The uptime monitor is showing that the API went down 7 times on 2024-06-21 for an estimated total 7 minutes.
jared.leddy
jared.leddy2w ago
It doesn't actually tell me why, but this is a "keyword found" type monitor.
No description
jared.leddy
jared.leddy2w ago
This monitor is an HTTP ping. It shows nothing happening on that date.
No description
Brody
Brody2w ago
I'm sure there's a hundred or more ways your app could crash or soft lock without nest knowing. are you on the v2 runtime? and on the new edge proxy?
jared.leddy
jared.leddy2w ago
I suspect that is probably true, though the how that can happen seem lost on me. I'm guessing no on v2 and edge, as I don't know what those are.
Brody
Brody2w ago
check your service settings
jared.leddy
jared.leddy2w ago
My Railway settings say Legacy runtime.
Brody
Brody2w ago
and the edge proxy?
jared.leddy
jared.leddy2w ago
Not enabled.
Brody
Brody2w ago
does your service have a volume?
jared.leddy
jared.leddy2w ago
I don't think so. I can't find anything that says Volume in the settings.
Brody
Brody2w ago
it's not in the settings, look at the project canvas
jared.leddy
jared.leddy2w ago
I'm guessing this is the canvas. If so, then it's just a Github repo and PostgreSQL DB.
No description
Brody
Brody2w ago
there's no volume on the API service you can see the postgres service has a volume enable the v2 runtime and edge proxy on your API service
jared.leddy
jared.leddy2w ago
I see what you're talking about. The bottom box. Deploying the updates now. That's done.
Brody
Brody2w ago
okay continue monitoring the service and report back
jared.leddy
jared.leddy2w ago
Copy.
Ayush
Ayush2w ago
i've also been having an issue with my nestjs API restarting sporatically throughout the day. Havent had time to look into it though, i believe i used one of the existing railway templates. Did you also use the template @jared.leddy?
jared.leddy
jared.leddy2w ago
No, we didn't do anything fancy. Just connect the repo and quick deploy it with ENVs and a DB.