Runpod worker automatic1111 just respond COMPLETED and not return anything

I'm using the worker from https://github.com/ashleykleynhans/runpod-worker-a1111/tree/main, latest version so it should fix the "error" dict problem. For some requests, it just returns the status Completed and runpod logs show something like in the image below. I have tried to create a Pod mount on that volume and run the local request with test_input.json, everything work normally. Can you @ashleyk help me with this?
No description
Solution:
Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine
Jump to solution
30 Replies
ashleyk
ashleyk10mo ago
Which evrsion of the docker image are you using?
leduyson2603
leduyson2603OP10mo ago
i have built a new version based on your latest repo
ashleyk
ashleyk10mo ago
Your own image?
leduyson2603
leduyson2603OP10mo ago
Yes, but i only changed the rp_handler.py to add some processing but i think its not the issue because when i ran it directly in POD, everything work fine. It still returns the result normally
ashleyk
ashleyk10mo ago
Must be some isuse because its working fine for me Which version of the RunPod SDK are you using?
leduyson2603
leduyson2603OP10mo ago
runpod 1.5.0
ashleyk
ashleyk10mo ago
Why such an old version Upgrade to the latest SDK Mine is on 1.6.0 and working fine (1.6.2) is the latest
leduyson2603
leduyson2603OP10mo ago
you mean the sdk in the docker image or when we setup the venv in runpod volume?
ashleyk
ashleyk10mo ago
Its in the network volume.
No description
ashleyk
ashleyk10mo ago
pip install -U runpod
pip install -U runpod
That should upgrade it to the latest version If you have FlashBoot enabled, you should scale your workers down to zero and back up again once its upgraded.
leduyson2603
leduyson2603OP10mo ago
Got it, should i upgrade the runpod version in docker image?
ashleyk
ashleyk10mo ago
No, its loaded from network drive not the docker image.
leduyson2603
leduyson2603OP10mo ago
Thank you! should i turn off the flashboot and turn on again because i have scaled down but still not update to latest version
ashleyk
ashleyk10mo ago
Don't mess with flashboot, scale workers down to zero and back up.
leduyson2603
leduyson2603OP10mo ago
This version is the version of runpod sdk, right?
No description
leduyson2603
leduyson2603OP10mo ago
i have updated the runpod sdk in volume, but still not updated to the worker?
No description
ashleyk
ashleyk10mo ago
Yeah its the version of the SDK but it should be 1.6.2 not 1.5.0 Did you scale your workers down and back up again after making the change?
leduyson2603
leduyson2603OP10mo ago
Yes, i have set both the min worker and max worker to zero and up again
ashleyk
ashleyk10mo ago
Did you install the SDK in your Docker image? My Dockerfile doesn't have it installed and loads it from the network volume. Actually I lie, it does It uses the one from the docker image, my apologies Rebuild your docker image I need to fix it to just use the one from the network volume, that was a dumb move Maybe it will be slower from network volume though, not sure.
leduyson2603
leduyson2603OP10mo ago
Yes, i think it will be slower but not worth to mention haha
leduyson2603
leduyson2603OP10mo ago
Still got the 400 bad request
No description
leduyson2603
leduyson2603OP10mo ago
I have checked the log in volume, it shows automatic1111 run normally This one only happened for some of my requests
ashleyk
ashleyk10mo ago
Looks like its trying to return the job results before its finished according to the log. @flash-singh or @Merrell any idea whats going on here?
Justin Merrell
Justin Merrell10mo ago
@leduyson2603 Can you paste the endpoint ID? @ashleyk It will attempt to return the results the worker will consider the job to be finished but status of the job will come from the job check. I'll dig into this some more.
leduyson2603
leduyson2603OP10mo ago
Endpoint id: x91sq4fdxbtprx
Solution
leduyson2603
leduyson260310mo ago
Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine
Justin Merrell
Justin Merrell10mo ago
That is likely it then I can modify the logging to make this clear
leduyson2603
leduyson2603OP10mo ago
Can we increase the limit? Or i need to save the image to s3 bucket and return the presigned url
ashleyk
ashleyk10mo ago
Is the response payload size limit documented? I know the request payload sizes are documented for /run and /runsync so if there is a limit on response payload, it will be great if it can be documented, and also to rather return FAILED with an error that indicates that the payload limit in the response has been exceeded. @Polar I don't think this is answered until we know more information about the response size and the error handling of responses that exeed the limit are handled as errors.
haris
haris10mo ago
Got it, will unmark
Want results from more Discord servers?
Add your server