If a DJANGO process is failing because of the workers

I do some maths in my django app on railway. It's complicated, and CPU bound. I have the current Developer account (the $5 one). The maths fails once deployed on some runs with this error:
[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140)
[2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140)
[2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9
[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140)
[2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140)
[2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9
Similar to this SO I'm not doing anything smart right now like managing workers, async things or using specific multi-threading libs. I'm not sure how to solve this issue (without further reducing the data set size, which is already as small as possible). I currently have 2 possible ideas: 1) Follow the advice on the SO, to increase the allocated worker memory 2) Add a FastAPI service to the railway deployment and somehow offload the process to there Can anyone share any suggestions to solve this, or assess the likelyhood of either of my plans working?
Stack Overflow
Gunicorn worker terminated with signal 9
I am running a Flask application and hosting it on Kubernetes from a Docker container. Gunicorn is managing workers that reply to API requests. The following warning message is a regular occurrence...
Solution:
Run from local the answer to your question is: ``` engi-alpha-django-web-1 | <class 'pandas.core.frame.DataFrame'> engi-alpha-django-web-1 | RangeIndex: 336 entries, 0 to 335...
Jump to solution
17 Replies
Percy
Percy2y ago
Project ID: 8a43b0e6-5720-40f3-8ee1-27f04ba76646
zimapurple
zimapurpleOP2y ago
8a43b0e6-5720-40f3-8ee1-27f04ba76646
Brody
Brody2y ago
can you show me a screenshot of your service metrics during the times the worker gets killed?
zimapurple
zimapurpleOP2y ago
Sure 🙂
Brody
Brody2y ago
when did you upgrade to the dev plan
zimapurple
zimapurpleOP2y ago
A few days ago, last week, around wed
Brody
Brody2y ago
okay good so your service would be on the dev plan show me the logs with the kill message
zimapurple
zimapurpleOP2y ago
[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140) [2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140) [2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9 [2023-06-05 14:02:42 +0000] [205] [INFO] Booting worker with pid: 205 <array_function internals>:200: RuntimeWarning: invalid value encountered in cast
Brody
Brody2y ago
how big would you say your dataset it?
zimapurple
zimapurpleOP2y ago
Hang on @Brody , rubber ducking this with you just struck me with something that PEBCAK. Will get back to you as I work through someting for 10 min or so
Brody
Brody2y ago
cool
Solution
zimapurple
zimapurple2y ago
Run from local the answer to your question is:
engi-alpha-django-web-1 | <class 'pandas.core.frame.DataFrame'>
engi-alpha-django-web-1 | RangeIndex: 336 entries, 0 to 335
engi-alpha-django-web-1 | Data columns (total 2 columns):
engi-alpha-django-web-1 | # Column Non-Null Count Dtype
engi-alpha-django-web-1 | --- ------ -------------- -----
engi-alpha-django-web-1 | 0 ds 336 non-null datetime64[ns, UTC]
engi-alpha-django-web-1 | 1 y 336 non-null float64
engi-alpha-django-web-1 | dtypes: datetime64[ns, UTC](1), float64(1)
engi-alpha-django-web-1 | memory usage: 5.4 KB
engi-alpha-django-web-1 | <class 'pandas.core.frame.DataFrame'>
engi-alpha-django-web-1 | RangeIndex: 336 entries, 0 to 335
engi-alpha-django-web-1 | Data columns (total 2 columns):
engi-alpha-django-web-1 | # Column Non-Null Count Dtype
engi-alpha-django-web-1 | --- ------ -------------- -----
engi-alpha-django-web-1 | 0 ds 336 non-null datetime64[ns, UTC]
engi-alpha-django-web-1 | 1 y 336 non-null float64
engi-alpha-django-web-1 | dtypes: datetime64[ns, UTC](1), float64(1)
engi-alpha-django-web-1 | memory usage: 5.4 KB
So not big. The answer to my problem however is not the data size, but some input data, (particularly the ones that fail), have majority 0 values, therefore the maths goes ballistic. It technically can cope with 0, but the way the maths gets executed becomes too intensive, and then thats where the timeout occurs on the deploy. So, that's the core issue, and I guess I need to do better UI/ upfront checks for that. but my original question still is floating in my head. If I ran up another railway service just to do the maths, and then got django to call it via an API, how does railway allocate resourcing between those 2 services? Would you expect there to be a performance gain in that architecture, or because everything is fundamentally using the same resources, it wouldn't nessicarily help? (this might just be a 'how long is a piece of string/ suck it and see' answer, but experienced insight might save me some labour 🙂 )
Brody
Brody2y ago
every service gets 8vcpu and 8gb mem, so offloading work to a worker service is totally a viable option
zimapurple
zimapurpleOP2y ago
Ace, to make sure I'm totally on the same page, my project witty-humour has multiple environments, such as production. In production I can have a django service that has 8vcup and 8gb mem and an additional FastAPI service, that has a separate 8vcpu and 8gb mem, making that environment access a total of 16 vcpu and 16 gbmem. Have I got that right?
Brody
Brody2y ago
yep, but if you actually constantly use that much resources you will face a very hefty bill
zimapurple
zimapurpleOP2y ago
Lol, totally. Thanks for the rubberducking.
Brody
Brody2y ago
no problem, come back if you need any more help 🙂
Want results from more Discord servers?
Add your server