RunPod•6mo ago

job timed out after 1 retries

Been seeing this a ton on my endpoint today resulting in being unable to return images. response_text: "{"delayTime":33917,"error":"job timed out after 1 retries","executionTime":31381,"id":"sync-80dbbd6d-309c-491f-a5d0-2bd79df9c386-e1","retries":1,"status":"FAILED","workerId":"a42ftdfxrn1zhx"}

21 Replies

bossmanOP•6mo ago

endpoint id 1m5phinhax6q0p

rougsig•6mo ago

Me too endpoint id uucgkak7h76hfd

rougsig•6mo ago

@bossman Do you have something like this?

bossmanOP•6mo ago

yep

Poddy•6mo ago

@bossman

Escalated To Zendesk

The thread has been escalated to Zendesk!

Jason•6mo ago

what sdk version do you guys use

bossmanOP•6mo ago

1.71

Jason•6mo ago

Try to check on the ticket yea

yhlong00000•6mo ago

Try to update your SDK to 1.7.4

bossmanOP•6mo ago

Just updated. Still seeing failures and things getting stuck in progress with no action { "delayTime": 33710, "id": "88b7d266-6d28-47f2-8640-d67d33c57ed6-u1", "retries": 1, "status": "IN_PROGRESS", "workerId": "caskw2lx8e1xu1" } { "delayTime": 39768, "error": "job timed out after 1 retries", "executionTime": 40212, "id": "108cda05-865d-40df-b19c-3ece785c8ca0-u1", "retries": 1, "status": "FAILED", "workerId": "caskw2lx8e1xu1" }

flash-singh•6mo ago

pm me details, there is something clearly weird going on with your endpoint

bossmanOP•6mo ago

FYI Every single time it fails to process an incoming request. I see this in the log:

message.txt

bossmanOP•6mo ago

And a job status that just shows "IN_PROGRESS" indefinitely. If there's any way to expedite a response on this please let me know. Mission critical stuff happening next week, hoping to get this resolved before then

Jason•6mo ago

Just pm ing flash-singh directly will expedite the response I believe

bossmanOP•6mo ago

Hey, PM'd details.

addsn•5mo ago

@bossman any luck solving the above? i am also getting the same job timed out after 1 retries error on my serverless endpoint

bossmanOP•5mo ago

I wish. Continued errors for me

addsn•5mo ago

@bossman saw another similar thread, i downgraded my runpod library to 1.7.2, the issue got solved after that

bossmanOP•5mo ago

Tried 1.7.1 to 1.7.5 with the same results. But over the weekend something must have changed. 100% success rates today. Also I switched my worker delay from queue to # of workers. Not sure if that had an effect, but testing with old setup I'm still seeing 100% success 12/1 and 12/2, with errors ending 11/30.

Jason•5mo ago

Did runpod team updates something perhaps

flash-singh•5mo ago

we had no updates over the holidays, we will be issuing fixes this week as we have tracked down the issue and are debugging root cause of it

Gaming

Programming

job timed out after 1 retries

Did you find this page helpful?