Serverless - 404 cannot return results
I'm getting the following error:
My workloads are running fine but the result will not return so get stuck in the queue.
This is runpod v1.6.1
I have attempted debugging in a live worker to do an early return with a fixed result but the same error persists. Please help!
14 Replies
does that job still exist if you use /status?
I thought I had deleted its here (edit - on mobile, hard to do things)
I'm rebuild against RunPod 1.4.2 now because that was the last version I built that I know was working as I expect. So I can debug more easily.
I've made some changes to my container but not to the worker, so I'm surprised by the breakage. Currently assuming it's my error somewhere but why would that URL be a 404?
@RobBalla Im also getting the same error, in the url that its trying to hit /job-done is "hevyjx14k6tl6p" the worker id or status id?
@Brever its endpointid/job-done/workerid and i have absolutely no idea what would cause this - I'm working backwards to work out what's happening.
Probably a missing dependency because everything else works. I've built against other runpod versions with the same issue so it's clearly my fault, but it's taking some time to work it out. Locally it works fine - making it even more awkward to figure out
@RobBalla if you return
error
as a dict
instead of str
then this kind of thing happens. I was also able to return error
as a dict
in older versions of the SDK but then it became a breaking change somewhere along the line unfortunately. Now you need to return error
as an str
and put the dict stuff into output
.
I don't like these kind of breaking changes in the SDK, it needs to be backwards compatible.
I had to change my worker like this:
Previously it was just:
But then it broke in new SDK versions 😱
This also causes the job to show as COMPLETED (without any output) instead of FAILED 😱
Pretty critical oversight from RunPod IMHO. cc: @Justin MerrellThanks @ashleyk I'll try playing around with it but it won't even let me return a simple sting as a test. It's quite frustrating!
I don't think it's a RunPod issue this time though, I think it is me so I'm not blaming them because even with an older SDK that used to work with this worker code it gives me the 404. Wondering if I've manipulated an environment variable somewhere that I shouldn't have.
I'll have to poke around in the sdk I think to figure out what's missing. I assume it's a POST request going to that URL although I found an old article that suggests a job $ID should follow the worker id and in my case there isn't one
Actually it's not that old and it's one of your answers (of course it is - they should pay you!) https://www.answeroverflow.com/m/1187367068643885126
serverless: any way to figure out what gpu type a job ran on? - RunPod
trying to get data on speeds across gpu types for our jobs, and i'm wondering if the api exposes this anywhere, and if not, what the best way to sort it out would be.
Did you maybe hard-code a version of
aiohttp
into a requirements.txt file?No the only install for the serverless environment is runpod, python magic and whatever requirements they bring.
That's with a version of the sdk that used to work, which is what suggests it's definitely my fault
Thats weird, I had an issue with an SDK version that used to work that started causing issues, but coming to think of it, it was causing the worker to run forever without shutting it down, which is not what you are seeing. The root cause behind that was due to a bug in a new version of
aiohttp
that was causing issues, but the bug has been resolved for a while now.Looks like my version is a couple of micro versions behind. I'll have to see if there's an issue there but I'll be spending some time logged into an active worker and shouting at it. Hopefully get to the bottom of it. Thank you for the pointers 🙏
Hope you get to the bottom if it soon, I hate those issues where you revert back to something and expect it to work and then it doesn't 😱
404 is a strange error, I am however looking into the error handling this morning @ashleyk
Well, I know what is wrong with it and as I suspected it is my fault.
I have a bash script that runs over the envs at start and writes them to a file so they can be passed to supervisord processes that would otherwise not have access to them (because it locks its environment) - Anyway, this script replaces $ID in the webhook post variable with '' so it obviously doesnt work. Annoying but easy to fix.