Railway•2y ago

Very slow response time

I've had a website hosted on Railway for the past month or so - load times have occasionally run a bit slow (up to 5 seconds) but usually are consistently within 1-2 seconds which is fine. Today has been a totally different story... consistently taking 6+ seconds for the code to run (I have a run time print statement built in the code) and often the site takes far more than that to load. On top of that, sometimes the site has just gone unresponsive for several minutes at a time, and in the logs errors such as the following print: [2023-01-25 01:22:02 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:10) [2023-01-25 01:22:02 +0000] [10] [INFO] Worker exiting (pid: 10) [2023-01-25 01:22:02 +0000] [288] [INFO] Booting worker with pid: 288 [2023-01-25 01:22:33 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:288) [2023-01-25 01:22:33 +0000] [288] [INFO] Worker exiting (pid: 288) [2023-01-25 01:22:33 +0000] [320] [INFO] Booting worker with pid: 320 When I run the site locally I encounter no issues and normal response times. What could be going on here?

24 Replies

Percy•2y ago

Project ID: 52b32c2c-e65f-443a-b59c-ba6a9bccf24c

Percy•2y ago

Samy mentioned that responses from the servers were always < 1 second, but now it's more than 2 seconds and sometimes even more than an 5 seconds. He did a rollback to a month ago and it's still slow, so it's possible that something happened with the servers last week.

⚠️ experimental feature

WeathermanOP•2y ago

52b32c2c-e65f-443a-b59c-ba6a9bccf24c

jackson•2y ago

likely just database latencies combined with django's poor optimization of database querying. what kind of queries are you executing to cause this kinda timeout? in most cases the latency to your local database is a couple nanoseconds, whereas the latency from a railway deployment to database is more like 5-10 milliseconds so each query takes much much longer, causing unoptimized queries to have sort of an exponentially worse performance as the latency increases

WeathermanOP•2y ago

The only db-related thing I have is a redis instance, and each website load makes max 4 calls to it. The only other calls are to weather apis which occur when the called location is not in the redis cache. Also I'm using Flask. Any ideas? This has been an on-and-off issue today. Is it possible if I upgrade to the teams plan it will mitigate the issue? Also this issue began before, but I launched a little promo yesterday and it pushed CPU to 124% (?), is that also reason to upgrade?

Adam•2y ago

Where is your db located? Railway services are all on US-West.

WeathermanOP•2y ago

I created the Redis within the project so its linked within the project environment

Adam•2y ago

Gotcha Have you tested how fast the weather API responds and if it ever hangs? That could easily be the issue Could be getting rate limited

angelo•2y ago

Can you link to your metrics tab? Post it here and I can check the mem and metrics.

WeathermanOP•2y ago

I've just thrown in some time print statements thoughout the code so im gonna try to get to bottom of it that way

WeathermanOP•2y ago

WeathermanOP•2y ago

WeathermanOP•2y ago

oh wait what do you mean by link

WeathermanOP•2y ago

https://railway.app/project/52b32c2c-e65f-443a-b59c-ba6a9bccf24c/service/13fa1b4e-0b97-4eae-8256-4b499b0cb703/metrics

Railway

Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.

angelo•2y ago

Yep! This is it

WeathermanOP•2y ago

Ok i’ve tested it out and it definitely has to do with the formula and not the api calls. - Basically the formula works by taking the hourly forecast and tweaking the conditions to create combinations of possible actual hour-by-hour snow cover outcomes. For example, it takes the hour-by-hour temperature array and generates 7 arrays with all temps -3, -2, -1, 0, +1, +2, +3. It then combines each array with 7 different timings. This happens with 2 more variables to create 777*8, or 2,744 snow cover arrays. - The code is also set to only go through all 4 stages under certain weather forecast conditions which is why only some locations have this issue. Without one of the variables, the number of arrays stays in the low-mid hundreds or less with a max processing time of about 3 seconds. - These are numpy arrays and each is a total length of ~30. - My computer is able to process the code in full in less than 1.5 seconds, while its taking about 7 seconds on railway. My computer is 32GB RAM while I see Railway is up to 8GB so I’m assuming what’s needed is more processing power. ---

Adam•2y ago

Your memory metrics aren’t reaching anywhere close to that amount

jackson•2y ago

ideally you do everything database related in 1 go, before you start post-processing data in memory

Adam•2y ago

I wouldn’t recommend an upgrade to the teams plan here. The CPU power will be the same, just higher capacity. Same with memory. Have you tried multithreading? Judging by how you described the process, this sounds like something you could multithread pretty easily Doesn't look like you're doing it atm judging by your metrics

jackson•2y ago

yes i believe multiprocessing is the way to go here, multithreading would be better if this was I/O bound but since this sounds like number crunching multiprocessing is the move probably

WeathermanOP•2y ago

Good idea I'm gonna try that and I'll update, thanks

Adam•2y ago

There’s a difference between multiprocessing and multithreading?? did not know that lol

jackson•2y ago

multiprocessing is great for cpu intensive stuff multithreading is great for IO stuff like requests or db queries watched a very handy video on it but those are the main takeaways

Adam•2y ago

Gotcha

Gaming

Programming

Very slow response time