"Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/91gr..."
I keep having these errors on my endpoints.
It happens most of the time for "high-res" images (4K) but they're JPEG and max 2MB.
Runpod serverless pods have significantly deteriorated these last days for me.
5 Replies
What is your SDK version?
Thanks for your help. Do you know where I can find this info?
(Note: I just noticed this happened with A100 PCIe and NOT A100 SXM)
info about the SDK version is usually in your dockerfile or requirements.txt of your repository
I put this in my dockerfile -> So I guess it's the latest version of runpod
RUN /opt/venv/bin/pip install runpod
and i assume the code is fine as you have workers that are okay
when did you deploy on your a100 pcie for the last time? I guess it might be related to the recent price drop for bigger GPUs followed by availability decrease
or it might be related to volume attached to it
anyway if your images doesnt exceed the limit of 10 mbs of a payload and it works on some machines, i would suggest you to try redeploying it :\
I've noticed that some of my A6000/A40 workers have been falling into throttling mode very often lately