Flask + Gunicorn app repeatedly getting killed and restarting
Service ID: f9e8d800-7f5f-4cdf-a508-830ce6caf939
We have a Flask app deployed using the Gunicorn server; our start command in our Profile is:
web: gunicorn -w 1 --threads 300 server:app
We recently did a new deploy and started observing our worker getting repeatedly killed and restarted. Here is the error trace that keeps occurring after each restart:
At first we thought this was due to our code changes, but we have since rolled back to a previous deploy that was working fine before, and we are still observing the same restart issue. Our metrics show that memory usage has remained roughly the same, but CPU usage has spiked for some reason, even though traffic to our server has not significantly increased.Solution:Jump to solution
Service ID: f9e8d800-7f5f-4cdf-a508-830ce6caf939
We have a Flask app deployed using the Gunicorn server; our start command in our Profile is:
web: gunicorn -w 1 --threads 300 server:app
...18 Replies
Project ID:
f9e8d800-7f5f-4cdf-a508-830ce6caf939
can you show a screenshot of the service metrics?
The CPU spike at 5:20pm is when we did the new deploy and started noticing the restarts
I'm assuming you are part of the pro plan?
we're on the hobby plan
should be 8 GB memory, right?
correct
critical worker timeout means that your code took longer than 30 seconds (the default timeout) to respond to a request, how long should your app take to respond to a request?
generally less than a second or two, we do have one endpoint that is longer running
the thing is, I increased the gunicorn timeout to 1000, and then the restarts stopped, but the server was unresponsive
when I tried to hit an endpoint that should take less than a second, it would hang instead
so I think the timeout was indicating a deeper issue
could this be an unhanded error from within your code or maybe you're doing an external API call and that's hanging?
because this would be an issue with your app, and not railway specifically
we've also rolled back to a previous revert that had been working perfectly fine, and in general we've never encountered this issue in several weeks of having this project live on railway
and to test, we also deployed the same code on heroku that's currently on railway, and it's working fine without any issues
just because it works locally or on another platform doesn't automatically mean it's an issue with railway
but there's also not a whole lot I can do to help you here, you'd need to find out what's causing your app to hang
sure, I'm just trying to think why our previous deploy had been working fine for several days but now when we try to deploy it we're seeing this issue
failed request to external API? database connection failed? you'll have to do some digging
I'm sorry that there's not more I could help you with here
no worries, thanks for getting back to me
no problem, and when you figure out what's happening I'd love to know about it!
I had a similar issue couple of days ago, the culprit was the db connection which works only if you use service variables. Hardcoding variables in the code didn’t work for some reason. HTH
So it turned out to be a new version of gunicorn that was released yesterday and introduced a breaking change to our code. We've since locked the versions of our packages in our requirements file and now it's working fine again. Thanks for your help!
awsome, glad you where able to figure it out, and thanks for coming back and telling us the problem!