RunPod•14mo ago

Runpod worker automatic1111 just respond COMPLETED and not return anything

I'm using the worker from https://github.com/ashleykleynhans/runpod-worker-a1111/tree/main, latest version so it should fix the "error" dict problem. For some requests, it just returns the status Completed and runpod logs show something like in the image below. I have tried to create a Pod mount on that volume and run the local request with test_input.json, everything work normally. Can you @ashleyk help me with this?

Solution:

Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine

Jump to solution

30 Replies

ashleyk•14mo ago

Which evrsion of the docker image are you using?

leduyson2603OP•14mo ago

i have built a new version based on your latest repo

ashleyk•14mo ago

Your own image?

leduyson2603OP•14mo ago

Yes, but i only changed the rp_handler.py to add some processing but i think its not the issue because when i ran it directly in POD, everything work fine. It still returns the result normally

ashleyk•14mo ago

Must be some isuse because its working fine for me Which version of the RunPod SDK are you using?

leduyson2603OP•14mo ago

runpod 1.5.0

ashleyk•14mo ago

Why such an old version Upgrade to the latest SDK Mine is on 1.6.0 and working fine (1.6.2) is the latest

leduyson2603OP•14mo ago

you mean the sdk in the docker image or when we setup the venv in runpod volume?

ashleyk•14mo ago

Its in the network volume.

ashleyk•14mo ago

pip install -U runpod

pip install -U runpod

That should upgrade it to the latest version If you have FlashBoot enabled, you should scale your workers down to zero and back up again once its upgraded.

leduyson2603OP•14mo ago

Got it, should i upgrade the runpod version in docker image?

ashleyk•14mo ago

No, its loaded from network drive not the docker image.

leduyson2603OP•14mo ago

Thank you! should i turn off the flashboot and turn on again because i have scaled down but still not update to latest version

ashleyk•14mo ago

Don't mess with flashboot, scale workers down to zero and back up.

leduyson2603OP•14mo ago

This version is the version of runpod sdk, right?

leduyson2603OP•14mo ago

i have updated the runpod sdk in volume, but still not updated to the worker?

ashleyk•14mo ago

Yeah its the version of the SDK but it should be 1.6.2 not 1.5.0 Did you scale your workers down and back up again after making the change?

leduyson2603OP•14mo ago

Yes, i have set both the min worker and max worker to zero and up again

ashleyk•14mo ago

Did you install the SDK in your Docker image? My Dockerfile doesn't have it installed and loads it from the network volume. Actually I lie, it does It uses the one from the docker image, my apologies Rebuild your docker image I need to fix it to just use the one from the network volume, that was a dumb move Maybe it will be slower from network volume though, not sure.

leduyson2603OP•14mo ago

Yes, i think it will be slower but not worth to mention haha

leduyson2603OP•14mo ago

Still got the 400 bad request

leduyson2603OP•14mo ago

I have checked the log in volume, it shows automatic1111 run normally This one only happened for some of my requests

ashleyk•14mo ago

Looks like its trying to return the job results before its finished according to the log. @flash-singh or @Merrell any idea whats going on here?

Justin Merrell•14mo ago

@leduyson2603 Can you paste the endpoint ID? @ashleyk It will attempt to return the results the worker will consider the job to be finished but status of the job will come from the job check. I'll dig into this some more.

leduyson2603OP•14mo ago

Endpoint id: x91sq4fdxbtprx

Solution

leduyson2603•14mo ago

Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine

Justin Merrell•14mo ago

That is likely it then I can modify the logging to make this clear

leduyson2603OP•14mo ago

Can we increase the limit? Or i need to save the image to s3 bucket and return the presigned url

ashleyk•14mo ago

Is the response payload size limit documented? I know the request payload sizes are documented for /run and /runsync so if there is a limit on response payload, it will be great if it can be documented, and also to rather return FAILED with an error that indicates that the payload limit in the response has been exceeded. @Polar I don't think this is answered until we know more information about the response size and the error handling of responses that exeed the limit are handled as errors.

haris•14mo ago

Got it, will unmark

Gaming

Programming

Runpod worker automatic1111 just respond COMPLETED and not return anything

Did you find this page helpful?