R
RunPod2mo ago
bossman

job timed out after 1 retries

Been seeing this a ton on my endpoint today resulting in being unable to return images. response_text: "{"delayTime":33917,"error":"job timed out after 1 retries","executionTime":31381,"id":"sync-80dbbd6d-309c-491f-a5d0-2bd79df9c386-e1","retries":1,"status":"FAILED","workerId":"a42ftdfxrn1zhx"}
21 Replies
bossman
bossmanOP2mo ago
endpoint id 1m5phinhax6q0p
rougsig
rougsig2mo ago
Me too endpoint id uucgkak7h76hfd
rougsig
rougsig2mo ago
@bossman Do you have something like this?
No description
bossman
bossmanOP2mo ago
yep
Poddy
Poddy2mo ago
@bossman
Escalated To Zendesk
The thread has been escalated to Zendesk!
nerdylive
nerdylive2mo ago
what sdk version do you guys use
bossman
bossmanOP2mo ago
1.71
nerdylive
nerdylive2mo ago
Try to check on the ticket yea
yhlong00000
yhlong000002mo ago
Try to update your SDK to 1.7.4
bossman
bossmanOP2mo ago
Just updated. Still seeing failures and things getting stuck in progress with no action { "delayTime": 33710, "id": "88b7d266-6d28-47f2-8640-d67d33c57ed6-u1", "retries": 1, "status": "IN_PROGRESS", "workerId": "caskw2lx8e1xu1" } { "delayTime": 39768, "error": "job timed out after 1 retries", "executionTime": 40212, "id": "108cda05-865d-40df-b19c-3ece785c8ca0-u1", "retries": 1, "status": "FAILED", "workerId": "caskw2lx8e1xu1" }
flash-singh
flash-singh2mo ago
pm me details, there is something clearly weird going on with your endpoint
bossman
bossmanOP2mo ago
FYI Every single time it fails to process an incoming request. I see this in the log:
bossman
bossmanOP2mo ago
And a job status that just shows "IN_PROGRESS" indefinitely. If there's any way to expedite a response on this please let me know. Mission critical stuff happening next week, hoping to get this resolved before then
nerdylive
nerdylive2mo ago
Just pm ing flash-singh directly will expedite the response I believe
bossman
bossmanOP2mo ago
Hey, PM'd details.
addsn
addsn2mo ago
@bossman any luck solving the above? i am also getting the same job timed out after 1 retries error on my serverless endpoint
bossman
bossmanOP2mo ago
I wish. Continued errors for me
addsn
addsn2mo ago
@bossman saw another similar thread, i downgraded my runpod library to 1.7.2, the issue got solved after that
bossman
bossmanOP2mo ago
Tried 1.7.1 to 1.7.5 with the same results. But over the weekend something must have changed. 100% success rates today. Also I switched my worker delay from queue to # of workers. Not sure if that had an effect, but testing with old setup I'm still seeing 100% success 12/1 and 12/2, with errors ending 11/30.
nerdylive
nerdylive2mo ago
Did runpod team updates something perhaps
flash-singh
flash-singh2mo ago
we had no updates over the holidays, we will be issuing fixes this week as we have tracked down the issue and are debugging root cause of it

Did you find this page helpful?