Runpod worker automatic1111 just respond COMPLETED and not return anything
I'm using the worker from https://github.com/ashleykleynhans/runpod-worker-a1111/tree/main, latest version so it should fix the "error" dict problem. For some requests, it just returns the status Completed and runpod logs show something like in the image below. I have tried to create a Pod mount on that volume and run the local request with test_input.json, everything work normally. Can you @ashleyk help me with this?
Solution:Jump to solution
Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine
30 Replies
Which evrsion of the docker image are you using?
i have built a new version based on your latest repo
Your own image?
Yes, but i only changed the rp_handler.py to add some processing but i think its not the issue
because when i ran it directly in POD, everything work fine. It still returns the result normally
Must be some isuse because its working fine for me
Which version of the RunPod SDK are you using?
runpod 1.5.0
Why such an old version
Upgrade to the latest SDK
Mine is on 1.6.0 and working fine (1.6.2) is the latest
you mean the sdk in the docker image or when we setup the venv in runpod volume?
Its in the network volume.
That should upgrade it to the latest version
If you have FlashBoot enabled, you should scale your workers down to zero and back up again once its upgraded.
Got it, should i upgrade the runpod version in docker image?
No, its loaded from network drive not the docker image.
Thank you!
should i turn off the flashboot and turn on again because i have scaled down but still not update to latest version
Don't mess with flashboot, scale workers down to zero and back up.
This version is the version of runpod sdk, right?
i have updated the runpod sdk in volume, but still not updated to the worker?
Yeah its the version of the SDK but it should be 1.6.2 not 1.5.0
Did you scale your workers down and back up again after making the change?
Yes, i have set both the min worker and max worker to zero and up again
Did you install the SDK in your Docker image? My Dockerfile doesn't have it installed and loads it from the network volume.
Actually I lie, it does
It uses the one from the docker image, my apologies
Rebuild your docker image
I need to fix it to just use the one from the network volume, that was a dumb move
Maybe it will be slower from network volume though, not sure.
Yes, i think it will be slower but not worth to mention haha
Still got the 400 bad request
I have checked the log in volume, it shows automatic1111 run normally
This one only happened for some of my requests
Looks like its trying to return the job results before its finished according to the log.
@flash-singh or @Merrell any idea whats going on here?
@leduyson2603 Can you paste the endpoint ID?
@ashleyk It will attempt to return the results the worker will consider the job to be finished but status of the job will come from the job check. I'll dig into this some more.
Endpoint id: x91sq4fdxbtprx
Solution
Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine
That is likely it then
I can modify the logging to make this clear
Can we increase the limit?
Or i need to save the image to s3 bucket and return the presigned url
Is the response payload size limit documented? I know the request payload sizes are documented for
/run
and /runsync
so if there is a limit on response payload, it will be great if it can be documented, and also to rather return FAILED
with an error that indicates that the payload limit in the response has been exceeded.
@Polar I don't think this is answered until we know more information about the response size and the error handling of responses that exeed the limit are handled as errors.Got it, will unmark