Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2...
Hi, have a serverless endpoint. Job completes successfully but the results are never returned and the job times out.
Any ideas how to resolve this? There are a few threads about this, but the conversation always drifted to another topic. Also I have submitted a ticket yesterday with no response. I am using this in production and whole my website is not working because of this. Seriously concerned about using runpod as this is probably the fourth time all stopped working for one or another reason.
Total progress: 100%|██████████| 38/38 [00:09<00:00, 4.15it/s]
2024-07-12T17:16:46.296682863Z INFO: 127.0.0.1:60656 - "POST /sdapi/v1/img2img HTTP/1.1" 200 OK
2024-07-12T17:16:48.343583857Z {"requestId": "7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/2lpmcp0nczozlm/job-done/nbvo20j26ji2mh/7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1?gpu=NVIDIA+RTX+A5000&isStream=false", "level": "ERROR"}
2024-07-12T17:16:48.343613177Z {"requestId": "7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1", "message": "Finished.", "level": "INFO"}
21 Replies
Are you using /runsync?
no, it is a run endpoint and then I am checking it.
and, what's your output size? ( estimated in mb)
five of 512x512, 768x768 or 1024x1024 images. Is this too much?
Hmm max is 10mb / run or per job
or its just runpod error
try to report this to runpod from the contact button
it used to work just fine, started a few days ago when other people started reporting it too (https://discord.com/channels/912829806415085598/1258094433816019114, https://discord.com/channels/912829806415085598/1185337101307367535/threads/1257349366973202453)
yeah then its maybe from runpod
What do you have set for Execution Timeout(s)?
120 seconds, but the job usually completes after 10. This message appears on completion (after 10 seconds).
I'm experiencing the same issue.
When the
Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/{endpoint-id}/job-done/...
error occurs, the job remains stuck in IN_PROGRESS
indefinitely, even though the log indicates that the job is completed. Plus, no webhook is sent from RunPod. It appears that RunPod fails to mark the job as completed due to this internal HTTP request failure.
Some people have suggested that this issue might be caused by a large payload returned from the handler. However, in my case, the output size is only a few KBs, as it is just a JSON containing a URL to the output file.Yeah maybe runpod's problem
try to report it to runpod
Same here
Seems to be fixed now.
I did however clone the endpoint into a new one, just in case.
It only happens sometimes right? Yeah maybe its fixed
I sure as hell hope so ! yeah i think something crashed in their backend.
Hope so too hahah
The problem still exists, it just occurred again.
Yes, it only happens sometimes, not consistently. It seems like the internal webhook connection on RunPod isn't stable.
I hope this issue gets fixed ASAP because it causes production jobs to get stuck indefinitely. Even worse, the stuck jobs might continue to drain credits.
Ey try to up your container disk space on the endpoint
What again ?? Inacceptable ! This got me doubting runpod.
The issue itself is really minor and stupid technically speaking -_-
having the same issue here too. Only started happening in the past couple of days but I haven't modified the handler at all in weeks
2024-07-20T23:41:26.892920681Z {"requestId": "f2d7a4d6-8bde-40d2-8ac9-d86e4134c165-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/...", "level": "ERROR"}
using the async endpoint
edit: scaling workers to 0 then back up seems to have fixed it for nowSame issues
2024-08-13T08:00:41.098287548Z ERROR | Error while getting job: Connection timeout to host https://api.runpod.ai/v2/...../job-take/.....?gpu=NVIDIA+RTX+A5000&job_in_progress=0
Hey try to report this to runpod via the website > click contact button on left menu
with your endpoint id