RunPod•8mo ago

524 Timeouts when waiting for new serverless messages

After my async python serverless handler finishes one request, I then start getting these on that box:

2024-09-26T22:11:55.344188433Z {"requestId": null, "message": "Failed to get job, status code: 524", "level": "ERROR"}

2024-09-26T22:11:55.344188433Z {"requestId": null, "message": "Failed to get job, status code: 524", "level": "ERROR"}

This seemingly prevents the auto-shutdown after N seconds from happening, so our runners stay up forever. One example is zpatg26htp69og.

9 Replies

yhlong00000•7mo ago

After reviewing the log, it looks like your worker remains active for a short period after completing the task. I assume you have an idle timeout configured? Each of your requests finishes quickly, and once the worker completes the task, it checks the queue for new tasks. The issue you mentioned might be due to a temporary network problem. Have you been seeing this error frequently? Most of the errors I observed occur when you’re checking the job result after 30 minutes. By that time, the results are no longer stored in our system, so you’ll need to retrieve them a bit sooner.

yasyfOP•7mo ago

yea I understand all of that, but the 524 happens very reproducibly and very frequently, so I dont think its a temp network problem and the result is the idle timeout is not expected and the worker stays alive longer than it should

flash-singh•7mo ago

are you using llms? we have new sdk releases planned to reduce amount of traffic for workers and reduce 524s from cloudflare

yasyfOP•7mo ago

yes, using LLMs. ok cool, will keep an eye out for that. anything else to do in the interim?

flash-singh•7mo ago

you reduce the number of concurrency, whats the value for that?

yasyfOP•7mo ago

its 4 right now. whats recommended value?

yhlong00000•7mo ago

you mean this value is 4?

yasyfOP•7mo ago

oh I'm not using VLLM, I meant the concurrency_modifier

yhlong00000•7mo ago

Ok, in any case, try the new version of the sdk 1.7.1, it improves batch requests. If you’re still seeing the issue, feel free to record a quick video and share it here.

Gaming

Programming

524 Timeouts when waiting for new serverless messages

Did you find this page helpful?