Railway•2y ago

My site was unavailable yesterday for about an hour; how to troubleshoot?

I'm not sure how to begin troubleshooting. I have a NodeJS SvelteKit website with Railway's MySQL. I'm using Prisma intermixed with pooled mysql2 for database access. I'm exposing an API to an external service that calls my endpoint. A couple times a month, the API becomes unresponsive for maybe an hour. I see no logs that would indicate the site or MySQL was down. In fact, I see very few logs at all on the Railway dashboard. How do I go about troubleshooting this? Am I losing pooled connections, or perhaps not closing connections? Running out of RAM? Do I need to wrap some exception handling somewhere? Where do I go to see server logs, or do I need to do something special to write logs?

14 Replies

Percy•2y ago

Project ID: b855592c-9156-4020-9543-d41a4e939fde

Percy•2y ago

You might find these helpful: - MySQL DB unresponsive - Database Connection Issues - Connection pools and connections

⚠️ experimental feature

VoiceOfSoftwareOP•2y ago

I think this is my project ID b855592c-9156-4020-9543-d41a4e939fde Project ID: b855592c-9156-4020-9543-d41a4e939fde Is there another place I should ask for support?

Brody•2y ago

logs are available in the "deploy logs" tab

VoiceOfSoftwareOP•2y ago

Thanks, I looked through those, and don't see anything that would indicate an issue. About 24 hours ago, I see MySQL's memory spiked from 1GB to 2GB, but no errors in the log for that time.

Percy•2y ago

Flagging this thread. A team member will be with you shortly.

VoiceOfSoftwareOP•2y ago

Just had another outage today Apr 07 15:18:41 (UTC+2). Checklyhq.com reported ESOCKETTIMEDOUT on https://troubled-finger-production.up.railway.app/api/calendar I see no logs for Apr 07 at all in deploy logs

angelo•2y ago

Hey there @VoiceOfSoftware - looking into this in earnest.

VoiceOfSoftwareOP•2y ago

How's it going? I know my project doesn't make Railway very much money, but if I can't ensure that it's reliable, I'll be forced to move to Azure or AWS, or Vercel with PlantScale. I really don't want to move away from Railway, because it's such a nice environment.

Brody•2y ago

that calendar data is pulled from your mysql database running on railway right?

JustJake•2y ago

There’s nothing really here for us to debug You’ll need to dig into your logs etc

VoiceOfSoftwareOP•2y ago

Yes, the calendar data is coming from the Railway MySQL tables. I WISH I could dig into the logs, but I don't see any. Where would I see them? What is the proper approach to add more logging if none are standard out of the box? Normally when I self-host, I can see system logs, MySQL logs, etc., but on Railway I don't have access to the file system, and I have no idea where to troubleshoot in the hosted MySQL. The API endpoint just...didn't respond...for about 30 minutes. A few days later I set up checklyhq.com to test the endpoint every 5 minutes, and it showed another outage for 18 minutes, with the same ESOCKETTIMEDOUT Where would I look in your logs for the socket timeout, and how would I know which service caused it? Was NodeJS down? Was MySQL down?

Brody•2y ago

well first, do you have any logging / error handling surrounding the code that queries the database in that endpoint? if not, add as much as you can. you cant dig into logs if you never log anything, and make sure logs are being sent to stdout and stderr respectively. add another endpoint /health that returns 200, so that you can have checklyhq check it with the same frequency as the check to /calendar, if /calendar goes down but not /health, you will know its a problem with the database or database connection and at that point you will want to check the database connection with a external tool like mysql workbench, do all this just to give yourself a better understanding of where the problem lies

JustJake•2y ago

Exactly what Brody stated. You don't have enough telemetry to determine where this problem came from, and it's very likely it's your code

Gaming

Programming

My site was unavailable yesterday for about an hour; how to troubleshoot?