Asynchronous serverless endpoint failing with 400 Bad Request
I'm getting the following error when my serverless endpoint tried to return it's output object:
"Failed to return job results. | 400, message='Bad Request', url='https://api.runpod.ai/v2/ne9y7bgqrpzcu6/job-done/asvftiq7ad2xzj/30238db1-1d48-4a80-8c5e-86f69acf3642-e1?gpu=$RUNPOD_GPU_TYPE_ID&isStream=false'"
The payload is small, only a KiB or so.
What can be the other causes of this "Bad Request", presumbly done by runpods python library?
20 Replies
I'm also get 404's for my sync endpoint: Failed to return job results. | 404, message='Not Found', url='https://api.runpod.ai/v2...'
It is also getting "retried":
Super weird
@bart
Escalated To Zendesk
The thread has been escalated to Zendesk!
I think there might be a problem with Runpod's connection
facing the same issue
I solved it. It was the runpod whisper template endpoint schema validation failing. I was passing an extra property in the inputs and it failed because of that.
I have the same issue
@srimanthd what do you mean extra property? can you share an example?
@xnorcode are you using whisper endpoint too?
I am running flux
not whisper
I don't have any input schema validations. my endpoint runs perfectly, but twice.
it loads the model, generates and uploads the image and completes the job. And then it tries to execute it again.
Ah, might be unrelated then.
https://github.com/runpod-workers/worker-faster_whisper/blob/main/src/rp_schema.py
This file has the allowed inputs. I was passing an additional input prop called "type": "speech_to_text" and then I got the erorr.
GitHub
worker-faster_whisper/src/rp_schema.py at main · runpod-workers/wor...
🎧 | RunPod worker of the faster-whisper model for Serverless Endpoint. - runpod-workers/worker-faster_whisper
and on the second attempt it shows the above error log
ah no, I'm not using that. I using a custom flux docker image
I have been getting this issue for 2-3 weeks now... rewrote my code 4 times! Still can't figure out what's wrong.. 😢
I raised a support ticket now, hopefully will figure out what's the issue soon
Has this isu been solved? Anyone uses a JS script to reach the serverless endpoint? Can you please share code, I keep getting 404
Maybe it is outdated if it doesn't work, meanwhile you can try to use like axios js library or just fetch to hit the endpoint
Or maybe share your code
Still there's issues...
So a little update on my case
After a lot of tests (writing/rewriting) of my code, I couldn't figure out what the issue was from side. I am in continues talk with the support team suggesting this was s know issue with the Runpod SDK version I was using and should change it to solve the issue
Hi Andreas,
The issue you’re experiencing is a known bug in SDK 1.7.1. To resolve this, please update to SDK 1.7.3 or downgrade to 1.6.2, which should fix the retry problem.
The root cause is that our system runs a health check during long tasks, and if the check isn’t reported in time, the job is put back in the queue, causing a retry.
Let me know if you have any questions or need further assistance.
Best Regards,
When upgrading to the latest version SDK 1.7.3 the worker container seems to crash (gets removed) once the model is loaded and my starts inference steps. So this is another issue we're experience and also forwarded to the team, hopefully they'll find a fix soon.
When downgrading to the SDK version 1.6.2 I now get another error causing the worker to stop/fail: ValueError: Host '127.0.0.1:8188' cannot contain ':' (at position 9)
I can't do anything about the 1.7.3 version so waiting for the Runpod team. I'm currently trying to see if there's anything I can do from my side to get the 1.6.2 version working (which also seems there's not much from side to do).
this is the error I get with 1.6.2:
ValueError: Host '127.0.0.1:8188' cannot contain ':' (at position 9)
raise ValueError(
File "/usr/local/lib/python3.10/dist-packages/yarl/_url.py", line 1386, in _encode_host
_host = _encode_host(host, validate_host=True)
File "/usr/local/lib/python3.10/dist-packages/yarl/_url.py", line 355, in build
url = URL.build(scheme=self.scheme, host=self.host)
File "/usr/local/lib/python3.10/dist-packages/aiohttp/web_request.py", line 451, in url
File "aiohttp/_helpers.pyx", line 26, in aiohttp._helpers.reify.get
not request.url.raw_path.startswith(self._prefix2)
File "/usr/local/lib/python3.10/dist-packages/aiohttp/web_urldispatcher.py", line 767, in resolve
match_dict, allowed = await resource.resolve(request)
File "/usr/local/lib/python3.10/dist-packages/aiohttp/web_urldispatcher.py", line 1022, in resolve
match_info = await self._router.resolve(request)
File "/usr/local/lib/python3.10/dist-packages/aiohttp/web_app.py", line 512, in _handle
resp = await request_handler(request)
File "/usr/local/lib/python3.10/dist-packages/aiohttp/web_protocol.py", line 452, in _handle_request
Traceback (most recent call last):
Error handling request
@yhlong00000
and here's the line of code that I get the error from:
once the comfy server is up and running the below get request raises the above error:
HOSTNAME = "127.0.0.1"
PORT = 8188
url = f"http://{HOSTNAME}:{PORT}"
response = requests.get(url)
# If the response status code is 200, the server is up and running
if response.status_code == 200:
utils.log(f"API: reachable!")
return True
tried all variations of url formatting
Hmm I'm thinking maybe run the command for updating requests version via pip
Search in google
After installing runpodctl
I think that your error trace is showing an error because the requests or aiohttp library
And since you just downgraded runpodctl andit didn't work, then it maybe because the older runpodctl uses older library dependency
Idk, hi yhlong
If you have long-running jobs, SDK 1.7.3 has a bug that causes them to retry unexpectedly. In my testing, versions 1.7.2 and 1.6.2 don’t have this issue. I’m not sure why you’re encountering the ValueError with 1.6.2, but could you try using 1.7.2 to see if it resolves the retry problem?
good suggestion, will try this now.
ok thnx, will try version 1.7.2 as well.
@yhlong00000 I completed testing with 1.7.2 and seems to be working perfectly without any retries. I just sent you an email with logs and more information for you to review. I'm now testing with 1.6.2 version and will update you on that soon.
@yhlong00000 Runpod SDK 1.6.2 not working even while updating requests after install runpod package as suggested above. I've emailed some more details about this test for you to review.
I am upgrading to SDK 1.7.2 which seems to be working fine so far.
Hopefully, we'll get a new stable version soon.
Thanks again for your prompt support!
Cool, thanks for testing it. Will let you know once we have new version.
I am indeed using
runpod==1.7.1
and will update it to 1.7.4
according to the GitHub advisory https://github.com/runpod/runpod-python/releases/tag/1.7.3 . If I experience any problems I will report back to this thread. Thanks all for the input and the swift responses and hopefully resolvement!GitHub
Release 1.7.3 · runpod/runpod-python
SDK 1.7.3 Advisory: Known Issues with Long-Running Jobs – Please Upgrade to 1.7.4
1.7.3: Long-running jobs (>60 seconds) can cause the system to stop the worker, triggering retries and failures....
1.7.4
seems to work well! No more getting retried and stuff like that. I'm not seeing my container logs after Tensorflow starts up, but that might be an issue on my end (e.g. not disabling python output buffering). Thanks!