Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2...

Hi, have a serverless endpoint. Job completes successfully but the results are never returned and the job times out. Any ideas how to resolve this? There are a few threads about this, but the conversation always drifted to another topic. Also I have submitted a ticket yesterday with no response. I am using this in production and whole my website is not working because of this. Seriously concerned about using runpod as this is probably the fourth time all stopped working for one or another reason. Total progress: 100%|██████████| 38/38 [00:09<00:00, 4.15it/s] 2024-07-12T17:16:46.296682863Z INFO: 127.0.0.1:60656 - "POST /sdapi/v1/img2img HTTP/1.1" 200 OK 2024-07-12T17:16:48.343583857Z {"requestId": "7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/2lpmcp0nczozlm/job-done/nbvo20j26ji2mh/7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1?gpu=NVIDIA+RTX+A5000&isStream=false", "level": "ERROR"} 2024-07-12T17:16:48.343613177Z {"requestId": "7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1", "message": "Finished.", "level": "INFO"}
21 Replies
nerdylive
nerdylive5mo ago
Are you using /runsync?
luckedup.
luckedup.OP5mo ago
no, it is a run endpoint and then I am checking it.
nerdylive
nerdylive5mo ago
and, what's your output size? ( estimated in mb)
luckedup.
luckedup.OP5mo ago
five of 512x512, 768x768 or 1024x1024 images. Is this too much?
nerdylive
nerdylive5mo ago
Hmm max is 10mb / run or per job or its just runpod error try to report this to runpod from the contact button
nerdylive
nerdylive5mo ago
yeah then its maybe from runpod
Encyrption
Encyrption5mo ago
What do you have set for Execution Timeout(s)?
luckedup.
luckedup.OP5mo ago
120 seconds, but the job usually completes after 10. This message appears on completion (after 10 seconds).
n8tzto
n8tzto5mo ago
I'm experiencing the same issue. When the Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/{endpoint-id}/job-done/... error occurs, the job remains stuck in IN_PROGRESS indefinitely, even though the log indicates that the job is completed. Plus, no webhook is sent from RunPod. It appears that RunPod fails to mark the job as completed due to this internal HTTP request failure. Some people have suggested that this issue might be caused by a large payload returned from the handler. However, in my case, the output size is only a few KBs, as it is just a JSON containing a URL to the output file.
nerdylive
nerdylive5mo ago
Yeah maybe runpod's problem try to report it to runpod
0xIbra
0xIbra5mo ago
Same here Seems to be fixed now. I did however clone the endpoint into a new one, just in case.
nerdylive
nerdylive5mo ago
It only happens sometimes right? Yeah maybe its fixed
0xIbra
0xIbra5mo ago
I sure as hell hope so ! yeah i think something crashed in their backend.
nerdylive
nerdylive5mo ago
Hope so too hahah
n8tzto
n8tzto5mo ago
The problem still exists, it just occurred again. Yes, it only happens sometimes, not consistently. It seems like the internal webhook connection on RunPod isn't stable. I hope this issue gets fixed ASAP because it causes production jobs to get stuck indefinitely. Even worse, the stuck jobs might continue to drain credits.
nerdylive
nerdylive5mo ago
Ey try to up your container disk space on the endpoint
0xIbra
0xIbra5mo ago
What again ?? Inacceptable ! This got me doubting runpod. The issue itself is really minor and stupid technically speaking -_-
llp
llp5mo ago
having the same issue here too. Only started happening in the past couple of days but I haven't modified the handler at all in weeks 2024-07-20T23:41:26.892920681Z {"requestId": "f2d7a4d6-8bde-40d2-8ac9-d86e4134c165-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/...", "level": "ERROR"} using the async endpoint edit: scaling workers to 0 then back up seems to have fixed it for now
rougsig
rougsig4mo ago
Same issues 2024-08-13T08:00:41.098287548Z ERROR | Error while getting job: Connection timeout to host https://api.runpod.ai/v2/...../job-take/.....?gpu=NVIDIA+RTX+A5000&job_in_progress=0
nerdylive
nerdylive4mo ago
Hey try to report this to runpod via the website > click contact button on left menu with your endpoint id
Want results from more Discord servers?
Add your server