andrews46
andrews46
RRailway
Created by andrews46 on 7/18/2023 in #✋|help
Flask + Gunicorn app repeatedly getting killed and restarting
Service ID: f9e8d800-7f5f-4cdf-a508-830ce6caf939 We have a Flask app deployed using the Gunicorn server; our start command in our Profile is: web: gunicorn -w 1 --threads 300 server:app We recently did a new deploy and started observing our worker getting repeatedly killed and restarted. Here is the error trace that keeps occurring after each restart:
[2023-07-18 01:52:35 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:12962)
Exception ignored in: <module 'threading' from '/root/.nix-profile/lib/python3.9/threading.py'>
Traceback (most recent call last):
File "/root/.nix-profile/lib/python3.9/threading.py", line 1447, in _shutdown
atexit_call()
File "/root/.nix-profile/lib/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/root/.nix-profile/lib/python3.9/threading.py", line 1060, in join
self._wait_for_tstate_lock()
File "/root/.nix-profile/lib/python3.9/threading.py", line 1080, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
File "/opt/venv/lib/python3.9/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
sys.exit(1)
SystemExit: 1
[2023-07-18 01:52:36 +0000] [1] [ERROR] Worker (pid:12962) exited with code 255
[2023-07-18 01:52:36 +0000] [1] [ERROR] Worker (pid:12962) exited with code 255.
[2023-07-18 01:52:36 +0000] [13217] [INFO] Booting worker with pid: 13217
[2023-07-18 01:52:35 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:12962)
Exception ignored in: <module 'threading' from '/root/.nix-profile/lib/python3.9/threading.py'>
Traceback (most recent call last):
File "/root/.nix-profile/lib/python3.9/threading.py", line 1447, in _shutdown
atexit_call()
File "/root/.nix-profile/lib/python3.9/concurrent/futures/thread.py", line 31, in _python_exit
t.join()
File "/root/.nix-profile/lib/python3.9/threading.py", line 1060, in join
self._wait_for_tstate_lock()
File "/root/.nix-profile/lib/python3.9/threading.py", line 1080, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
File "/opt/venv/lib/python3.9/site-packages/gunicorn/workers/base.py", line 203, in handle_abort
sys.exit(1)
SystemExit: 1
[2023-07-18 01:52:36 +0000] [1] [ERROR] Worker (pid:12962) exited with code 255
[2023-07-18 01:52:36 +0000] [1] [ERROR] Worker (pid:12962) exited with code 255.
[2023-07-18 01:52:36 +0000] [13217] [INFO] Booting worker with pid: 13217
At first we thought this was due to our code changes, but we have since rolled back to a previous deploy that was working fine before, and we are still observing the same restart issue. Our metrics show that memory usage has remained roughly the same, but CPU usage has spiked for some reason, even though traffic to our server has not significantly increased.
30 replies