My site was unavailable yesterday for about an hour; how to troubleshoot?

I'm not sure how to begin troubleshooting. I have a NodeJS SvelteKit website with Railway's MySQL. I'm using Prisma intermixed with pooled mysql2 for database access. I'm exposing an API to an external service that calls my endpoint. A couple times a month, the API becomes unresponsive for maybe an hour. I see no logs that would indicate the site or MySQL was down. In fact, I see very few logs at all on the Railway dashboard. How do I go about troubleshooting this? Am I losing pooled connections, or perhaps not closing connections? Running out of RAM? Do I need to wrap some exception handling somewhere? Where do I go to see server logs, or do I need to do something special to write logs?
14 Replies
Percy
Percy2y ago
Project ID: b855592c-9156-4020-9543-d41a4e939fde
Percy
Percy2y ago
⚠️ experimental feature
VoiceOfSoftware
VoiceOfSoftwareOP2y ago
I think this is my project ID b855592c-9156-4020-9543-d41a4e939fde Project ID: b855592c-9156-4020-9543-d41a4e939fde Is there another place I should ask for support?
Brody
Brody2y ago
logs are available in the "deploy logs" tab
VoiceOfSoftware
VoiceOfSoftwareOP2y ago
Thanks, I looked through those, and don't see anything that would indicate an issue. About 24 hours ago, I see MySQL's memory spiked from 1GB to 2GB, but no errors in the log for that time.
Percy
Percy2y ago
Flagging this thread. A team member will be with you shortly.
VoiceOfSoftware
VoiceOfSoftwareOP2y ago
Just had another outage today Apr 07 15:18:41 (UTC+2). Checklyhq.com reported ESOCKETTIMEDOUT on https://troubled-finger-production.up.railway.app/api/calendar I see no logs for Apr 07 at all in deploy logs
angelo
angelo2y ago
Hey there @VoiceOfSoftware - looking into this in earnest.
VoiceOfSoftware
VoiceOfSoftwareOP2y ago
How's it going? I know my project doesn't make Railway very much money, but if I can't ensure that it's reliable, I'll be forced to move to Azure or AWS, or Vercel with PlantScale. I really don't want to move away from Railway, because it's such a nice environment.
Brody
Brody2y ago
that calendar data is pulled from your mysql database running on railway right?
JustJake
JustJake2y ago
There’s nothing really here for us to debug You’ll need to dig into your logs etc
VoiceOfSoftware
VoiceOfSoftwareOP2y ago
Yes, the calendar data is coming from the Railway MySQL tables. I WISH I could dig into the logs, but I don't see any. Where would I see them? What is the proper approach to add more logging if none are standard out of the box? Normally when I self-host, I can see system logs, MySQL logs, etc., but on Railway I don't have access to the file system, and I have no idea where to troubleshoot in the hosted MySQL. The API endpoint just...didn't respond...for about 30 minutes. A few days later I set up checklyhq.com to test the endpoint every 5 minutes, and it showed another outage for 18 minutes, with the same ESOCKETTIMEDOUT Where would I look in your logs for the socket timeout, and how would I know which service caused it? Was NodeJS down? Was MySQL down?
Brody
Brody2y ago
well first, do you have any logging / error handling surrounding the code that queries the database in that endpoint? if not, add as much as you can. you cant dig into logs if you never log anything, and make sure logs are being sent to stdout and stderr respectively. add another endpoint /health that returns 200, so that you can have checklyhq check it with the same frequency as the check to /calendar, if /calendar goes down but not /health, you will know its a problem with the database or database connection and at that point you will want to check the database connection with a external tool like mysql workbench, do all this just to give yourself a better understanding of where the problem lies
JustJake
JustJake2y ago
Exactly what Brody stated. You don't have enough telemetry to determine where this problem came from, and it's very likely it's your code
Want results from more Discord servers?
Add your server