Railway•2y ago

If a DJANGO process is failing because of the workers

I do some maths in my django app on railway. It's complicated, and CPU bound. I have the current Developer account (the $5 one). The maths fails once deployed on some runs with this error:

[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140)
[2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140)
[2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9

[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140)
[2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140)
[2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9

Similar to this SO I'm not doing anything smart right now like managing workers, async things or using specific multi-threading libs. I'm not sure how to solve this issue (without further reducing the data set size, which is already as small as possible). I currently have 2 possible ideas: 1) Follow the advice on the SO, to increase the allocated worker memory 2) Add a FastAPI service to the railway deployment and somehow offload the process to there Can anyone share any suggestions to solve this, or assess the likelyhood of either of my plans working?

Stack Overflow

Gunicorn worker terminated with signal 9

I am running a Flask application and hosting it on Kubernetes from a Docker container. Gunicorn is managing workers that reply to API requests. The following warning message is a regular occurrence...

Solution:

Run from local the answer to your question is: ``` engi-alpha-django-web-1 | <class 'pandas.core.frame.DataFrame'> engi-alpha-django-web-1 | RangeIndex: 336 entries, 0 to 335...

Jump to solution

17 Replies

Percy•2y ago

Project ID: 8a43b0e6-5720-40f3-8ee1-27f04ba76646

zimapurpleOP•2y ago

8a43b0e6-5720-40f3-8ee1-27f04ba76646

Brody•2y ago

can you show me a screenshot of your service metrics during the times the worker gets killed?

zimapurpleOP•2y ago

Sure 🙂

Brody•2y ago

when did you upgrade to the dev plan

zimapurpleOP•2y ago

A few days ago, last week, around wed

Brody•2y ago

okay good so your service would be on the dev plan show me the logs with the kill message

zimapurpleOP•2y ago

[2023-06-05 14:02:41 +0000] [74] [CRITICAL] WORKER TIMEOUT (pid:140) [2023-06-05 14:02:41 +0000] [140] [INFO] Worker exiting (pid: 140) [2023-06-05 14:02:42 +0000] [74] [WARNING] Worker with pid 140 was terminated due to signal 9 [2023-06-05 14:02:42 +0000] [205] [INFO] Booting worker with pid: 205 <array_function internals>:200: RuntimeWarning: invalid value encountered in cast

Brody•2y ago

how big would you say your dataset it?

zimapurpleOP•2y ago

Hang on @Brody , rubber ducking this with you just struck me with something that PEBCAK. Will get back to you as I work through someting for 10 min or so

Brody•2y ago

cool

Solution

zimapurple•2y ago

Run from local the answer to your question is:

engi-alpha-django-web-1  | <class 'pandas.core.frame.DataFrame'>
engi-alpha-django-web-1  | RangeIndex: 336 entries, 0 to 335
engi-alpha-django-web-1  | Data columns (total 2 columns):
engi-alpha-django-web-1  |  #   Column  Non-Null Count  Dtype              
engi-alpha-django-web-1  | ---  ------  --------------  -----              
engi-alpha-django-web-1  |  0   ds      336 non-null    datetime64[ns, UTC]
engi-alpha-django-web-1  |  1   y       336 non-null    float64            
engi-alpha-django-web-1  | dtypes: datetime64[ns, UTC](1), float64(1)
engi-alpha-django-web-1  | memory usage: 5.4 KB

engi-alpha-django-web-1  | <class 'pandas.core.frame.DataFrame'>
engi-alpha-django-web-1  | RangeIndex: 336 entries, 0 to 335
engi-alpha-django-web-1  | Data columns (total 2 columns):
engi-alpha-django-web-1  |  #   Column  Non-Null Count  Dtype              
engi-alpha-django-web-1  | ---  ------  --------------  -----              
engi-alpha-django-web-1  |  0   ds      336 non-null    datetime64[ns, UTC]
engi-alpha-django-web-1  |  1   y       336 non-null    float64            
engi-alpha-django-web-1  | dtypes: datetime64[ns, UTC](1), float64(1)
engi-alpha-django-web-1  | memory usage: 5.4 KB

So not big. The answer to my problem however is not the data size, but some input data, (particularly the ones that fail), have majority 0 values, therefore the maths goes ballistic. It technically can cope with 0, but the way the maths gets executed becomes too intensive, and then thats where the timeout occurs on the deploy. So, that's the core issue, and I guess I need to do better UI/ upfront checks for that. but my original question still is floating in my head. If I ran up another railway service just to do the maths, and then got django to call it via an API, how does railway allocate resourcing between those 2 services? Would you expect there to be a performance gain in that architecture, or because everything is fundamentally using the same resources, it wouldn't nessicarily help? (this might just be a 'how long is a piece of string/ suck it and see' answer, but experienced insight might save me some labour 🙂 )

Brody•2y ago

every service gets 8vcpu and 8gb mem, so offloading work to a worker service is totally a viable option

zimapurpleOP•2y ago

Ace, to make sure I'm totally on the same page, my project witty-humour has multiple environments, such as production. In production I can have a django service that has 8vcup and 8gb mem and an additional FastAPI service, that has a separate 8vcpu and 8gb mem, making that environment access a total of 16 vcpu and 16 gbmem. Have I got that right?

Brody•2y ago

yep, but if you actually constantly use that much resources you will face a very hefty bill

zimapurpleOP•2y ago

Lol, totally. Thanks for the rubberducking.

Brody•2y ago

no problem, come back if you need any more help 🙂

Gaming

Programming

If a DJANGO process is failing because of the workers