job timed out after 1 retries
Been seeing this a ton on my endpoint today resulting in being unable to return images.
response_text: "{"delayTime":33917,"error":"job timed out after 1 retries","executionTime":31381,"id":"sync-80dbbd6d-309c-491f-a5d0-2bd79df9c386-e1","retries":1,"status":"FAILED","workerId":"a42ftdfxrn1zhx"}
21 Replies
endpoint id 1m5phinhax6q0p
Me too
endpoint id uucgkak7h76hfd
@bossman Do you have something like this?
yep
@bossman
Escalated To Zendesk
The thread has been escalated to Zendesk!
what sdk version do you guys use
1.71
Try to check on the ticket yea
Try to update your SDK to 1.7.4
Just updated. Still seeing failures and things getting stuck in progress with no action
{
"delayTime": 33710,
"id": "88b7d266-6d28-47f2-8640-d67d33c57ed6-u1",
"retries": 1,
"status": "IN_PROGRESS",
"workerId": "caskw2lx8e1xu1"
}
{
"delayTime": 39768,
"error": "job timed out after 1 retries",
"executionTime": 40212,
"id": "108cda05-865d-40df-b19c-3ece785c8ca0-u1",
"retries": 1,
"status": "FAILED",
"workerId": "caskw2lx8e1xu1"
}
pm me details, there is something clearly weird going on with your endpoint
FYI Every single time it fails to process an incoming request. I see this in the log:
And a job status that just shows "IN_PROGRESS" indefinitely.
If there's any way to expedite a response on this please let me know. Mission critical stuff happening next week, hoping to get this resolved before then
Just pm ing flash-singh directly will expedite the response I believe
Hey, PM'd details.
@bossman any luck solving the above? i am also getting the same job timed out after 1 retries error on my serverless endpoint
I wish. Continued errors for me
@bossman saw another similar thread, i downgraded my runpod library to 1.7.2, the issue got solved after that
Tried 1.7.1 to 1.7.5 with the same results. But over the weekend something must have changed. 100% success rates today.
Also I switched my worker delay from queue to # of workers. Not sure if that had an effect, but testing with old setup I'm still seeing 100% success 12/1 and 12/2, with errors ending 11/30.
Did runpod team updates something perhaps
we had no updates over the holidays, we will be issuing fixes this week as we have tracked down the issue and are debugging root cause of it