Very slow response time
I've had a website hosted on Railway for the past month or so - load times have occasionally run a bit slow (up to 5 seconds) but usually are consistently within 1-2 seconds which is fine.
Today has been a totally different story... consistently taking 6+ seconds for the code to run (I have a run time print statement built in the code) and often the site takes far more than that to load. On top of that, sometimes the site has just gone unresponsive for several minutes at a time, and in the logs errors such as the following print:
[2023-01-25 01:22:02 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:10)
[2023-01-25 01:22:02 +0000] [10] [INFO] Worker exiting (pid: 10)
[2023-01-25 01:22:02 +0000] [288] [INFO] Booting worker with pid: 288
[2023-01-25 01:22:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:288)
[2023-01-25 01:22:33 +0000] [288] [INFO] Worker exiting (pid: 288)
[2023-01-25 01:22:33 +0000] [320] [INFO] Booting worker with pid: 320
When I run the site locally I encounter no issues and normal response times. What could be going on here?
24 Replies
Project ID:
52b32c2c-e65f-443a-b59c-ba6a9bccf24c
Samy mentioned that responses from the servers were always < 1 second, but now it's more than 2 seconds and sometimes even more than an 5 seconds. He did a rollback to a month ago and it's still slow, so it's possible that something happened with the servers last week.
⚠️ experimental feature
52b32c2c-e65f-443a-b59c-ba6a9bccf24c
likely just database latencies combined with django's poor optimization of database querying. what kind of queries are you executing to cause this kinda timeout?
in most cases the latency to your local database is a couple nanoseconds, whereas the latency from a railway deployment to database is more like 5-10 milliseconds
so each query takes much much longer, causing unoptimized queries to have sort of an exponentially worse performance as the latency increases
The only db-related thing I have is a redis instance, and each website load makes max 4 calls to it. The only other calls are to weather apis which occur when the called location is not in the redis cache. Also I'm using Flask.
Any ideas? This has been an on-and-off issue today. Is it possible if I upgrade to the teams plan it will mitigate the issue?
Also this issue began before, but I launched a little promo yesterday and it pushed CPU to 124% (?), is that also reason to upgrade?
Where is your db located? Railway services are all on US-West.
I created the Redis within the project so its linked within the project environment
Gotcha
Have you tested how fast the weather API responds and if it ever hangs? That could easily be the issue
Could be getting rate limited
Can you link to your metrics tab?
Post it here and I can check the mem and metrics.
I've just thrown in some time print statements thoughout the code so im gonna try to get to bottom of it that way
oh wait what do you mean by link
Railway
Railway
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.
Yep!
This is it
Ok i’ve tested it out and it definitely has to do with the formula and not the api calls.
-
Basically the formula works by taking the hourly forecast and tweaking the conditions to create combinations of possible actual hour-by-hour snow cover outcomes. For example, it takes the hour-by-hour temperature array and generates 7 arrays with all temps -3, -2, -1, 0, +1, +2, +3. It then combines each array with 7 different timings. This happens with 2 more variables to create 777*8, or 2,744 snow cover arrays.
-
The code is also set to only go through all 4 stages under certain weather forecast conditions which is why only some locations have this issue. Without one of the variables, the number of arrays stays in the low-mid hundreds or less with a max processing time of about 3 seconds.
-
These are numpy arrays and each is a total length of ~30.
-
My computer is able to process the code in full in less than 1.5 seconds, while its taking about 7 seconds on railway. My computer is 32GB RAM while I see Railway is up to 8GB so I’m assuming what’s needed is more processing power.
---
Your memory metrics aren’t reaching anywhere close to that amount
ideally you do everything database related in 1 go, before you start post-processing data in memory
I wouldn’t recommend an upgrade to the teams plan here. The CPU power will be the same, just higher capacity. Same with memory. Have you tried multithreading?
Judging by how you described the process, this sounds like something you could multithread pretty easily
Doesn't look like you're doing it atm judging by your metrics
yes i believe multiprocessing is the way to go here, multithreading would be better if this was I/O bound but since this sounds like number crunching multiprocessing is the move probably
Good idea I'm gonna try that and I'll update, thanks
There’s a difference between multiprocessing and multithreading??
did not know that lol
multiprocessing is great for cpu intensive stuff
multithreading is great for IO stuff like requests or db queries
watched a very handy video on it but those are the main takeaways
Gotcha