Issue with Multiple instances of ComfyUI running simultaneously on Serverless

Hello, I am using Runpod Serverless and deploying ComfyUI using this repo: https://github.com/blib-la/runpod-worker-comfy?tab=readme-ov-file#bring-your-own-models For the Server, this repo is being used: https://github.com/comfyanonymous/ComfyUI I am deploying via docker image and both these repos are engrained into the image. When I run 2-3 workers via API, the comfy server gets activated and it responds as usual. The problem arose when multiple API requests came for example more than 5 requests came to workers and more than 5 workers got activated, in that case, the ComfyUI server creates an issue and does not get activate. I understand that activation of ComfyUI server is related to the comfyUI server code but if that is the case then even 1 worker shouldn't work but that is not the case. When workers are less, everything is working fine as soon number of workers increase then comfyUI server does not get's activated. I appreciate if anyone takes a look. Thank You
GitHub
GitHub - comfyanonymous/ComfyUI: The most powerful and modular diff...
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. - comfyanonymous/ComfyUI
GitHub
GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API o...
ComfyUI as a serverless API on RunPod. Contribute to blib-la/runpod-worker-comfy development by creating an account on GitHub.
59 Replies
Encyrption
Encyrption2mo ago
I am running a custom blib-la/runpod-worker-comfy image. I'm not sure what you mean by server? The basic config of blib-la/runpod-worker-comfy is to run the ComfyUI API server on the worker. The RunPod handler reaches out to it locally. If you have modified this behavior can you provide more details on those modifications?
SyedAliii
SyedAliii2mo ago
@Encyrption I am doing the same thing. Blib repo make comfyUI local server and then send request to it. The problem is when they are less than 3-4 workers, everything works fine. Api becomes reachable after 15-20 retires (Retires are done after few miliseconds - the default behavior). But when the workers are more than 5 then Api is not reachable. Server does not get activated. For example out of 5 workers only 2-3 workers able to activate server and others keeps retrying until max retries are reached.
Encyrption
Encyrption2mo ago
That's odd. Each worker should be an island unto itself. I'm not sure how having more workers would impact the function of any single worker. How are you handling models?
SyedAliii
SyedAliii2mo ago
@Encyrption That us what my thought is that each worker is independent. Silly thought i have, that probably the port is occupied so I have randomly generate ports on which server ran for each worker but issue remains the same. I have setup models, loras, custom nodes everything within docker image, no network volumn is attached.
Encyrption
Encyrption2mo ago
Have you checked when this is happening which workers they are running on? The port should be wholly contained inside the docker image it doesn't even touch the host ports. maybe if the host was completly out of ports???
SyedAliii
SyedAliii2mo ago
I have a python script for testing . I just send 5 request instantly to my endpoint. Rest runpod endpoint assign each request to worker. Can you please explain what do you mean to check workers?
Encyrption
Encyrption2mo ago
If you go into serverless you can select workers and see the status of all the workers assigned your endpoint.
No description
Encyrption
Encyrption2mo ago
What do you have set for max workers?
SyedAliii
SyedAliii2mo ago
They all are in running state, no throttling or any other thing happening to workers. The worker keeps retrying and after 500 retires send me a fail response.
Encyrption
Encyrption2mo ago
so, all your requests are IN_PROGRESS state? or are you using RUNSYNC?
SyedAliii
SyedAliii2mo ago
I have setup 20 workers max and issue remains. (i know 30 is max but i have asked runpod to give me more workers so my max is 50). My request is RunSync. I wait for request to complete
Encyrption
Encyrption2mo ago
50 would be nice, all I could get from them was 35.
SyedAliii
SyedAliii2mo ago
Some other endpoints i am running that's why i need those.
Encyrption
Encyrption2mo ago
So, you can see from logs that the ComfyUI API is timeing out?
SyedAliii
SyedAliii2mo ago
Yes
Encyrption
Encyrption2mo ago
As long as you are paying RP enough I'm sure they will continue to give more... I am not currently spending anything, in development. I do everything Asnyc but I have no such issues... Although I currently only have flux schnell, dev, sd3, and sdxl. I don't have anything custom.
SyedAliii
SyedAliii2mo ago
Yes, i can see everything from the logs. Server retry time out and then send back fail response. There are no unusual errors in logs. i am seeing it is trying to reach to server api.
Encyrption
Encyrption2mo ago
I would expect some of that while it syncs up. Are you running in specific region?
SyedAliii
SyedAliii2mo ago
I have custom nodes and models but these are unrelated to the issue i believe
Encyrption
Encyrption2mo ago
Yeah, don't see how that would change anything.
SyedAliii
SyedAliii2mo ago
No specific region, i have selected global region coz network volumn is not attached so no region restriction.
Encyrption
Encyrption2mo ago
Do you block out any regions?
SyedAliii
SyedAliii2mo ago
No
Encyrption
Encyrption2mo ago
I am currently blocking EU-* and US-OR as they have had issues reported. still have seen no update about them getting it working
SyedAliii
SyedAliii2mo ago
Btw, may be you can say that to my specific endpont, there is some internet bug. But tested on a test endpoint. Issue remains the same.
Encyrption
Encyrption2mo ago
Do you have a local GPU you can test locally?
SyedAliii
SyedAliii2mo ago
I have and i am able to run comfyui gui. Do you get these updates regarding which region is causing issue from their other channel, if so please let me know
Encyrption
Encyrption2mo ago
If you have local GPU you can use the docker compose from the repo to run it in local API mode. Just people talking about it on this server
SyedAliii
SyedAliii2mo ago
But on single worker everything works fine. Issue is when multiple workers get request. I don't think, i can simulate this behavior on my local machine. Can you please explain about this more? Can you please explain about this more?
Encyrption
Encyrption2mo ago
I would try blocking those regions I mentioned and testing again.. and open ticket with RP. It shouldn't matter how many workers are running each worker should be an isolated entity.
SyedAliii
SyedAliii2mo ago
Yes, each worker is seperate gpu. @Encyrption tried blocking the EU and US-OR region but the issue persists. However, I have noticed that today the error rate is low. Like if I send 10 requests then I see that all 10 get completed and sometimes 2-3 requests fail. So issue appears to be runpod internal. Thank you for your time to take a look at this.
gnarley_farley.
Hey I dont know if you have figured this out yet. But you cannot use runsync like this.
gnarley_farley.
use the async endpoint instead. Look here
No description
gnarley_farley.
The issue is that your requests are something passing the limit.
gnarley_farley.
Here is everything your need to get your problem solved. https://docs.runpod.io/serverless/endpoints/job-operations
Job operations | RunPod Documentation
Learn how to use the Runpod Endpoint to manage job operations, including running, checking status, purging queues, and streaming results, with cURL and SDK examples.
gnarley_farley.
Use a polling mechanism and check every few seconds to see if your requests are ready. I am using the exact same severless comfyui api to power my application. I do bulk processing of images and I ran into the exact same issue. This is how I fixed it. I can run hundreds of images in one shot now effortlessly. Even with just 3 active gpus. It's much more performant anyways to use the polling system. And if you build an app on say severless in future it won't effect your serverless function limits. Using runsync and waiting for a bunch of requests to return is not very efficient. Modern frameworks like nextjs have a 10s limit on function timeouts and the the serverless function time is one of the biggest cost factors And it does not effect the speed at all. I feel it is faster now. I have not done tests to verify that. But it's definitely not slower Im using 3x 4090's and ripping hard. I saw no/very little performance increase from using bigger gpus
SyedAliii
SyedAliii5w ago
@gnarley_farley. Hello, Thank you so much. Your point totally makes sense. Though two queries: 1- Even though async is better performant but even if I use sync they said that limit is 2000 per 10 sec. I believe i haven't even cross 100. 2- Using async, what is the time interval after which you make request to check status Though it depends upon a task you are performing but still what is your suggestion.
gnarley_farley.
Because it's one request at a time and comfyui doesn't use batching, you don't really gain much from the extra ram. Set the polling to 1 second interval. It is very performant. I don't see a difference at all My requests come back in the same time. I know those limits are fudged. I had to figure it out through trial and error
SyedAliii
SyedAliii5w ago
Yes i have seen that too. Using better gpus does not effect much. But i have seen that if I run comfy with --gpu-only, i have seen increase like for example if one job takes 18 sec then with gpu flag takes 13-14 seconds.
gnarley_farley.
OKay cool thanks. I will test that out. Where are you adding the flag?
SyedAliii
SyedAliii5w ago
When you run python main.py in comfy ui directory there write python main.py --gpu-only You can also see other flags like highVram etc. Write python main.py --help
gnarley_farley.
Thanks will check that out don't know much about Comfyui. If anyone can make an inpainting version of this worker that uses flux it would be AMAZING! I want to be able to just pass in a masked photo, with a prompt and get something back. My starlink+ dockerhub is just super slow for some reason, can't effectively push these big images.
SyedAliii
SyedAliii5w ago
Why not make a pod and run your comfy expertiments there? Install models using wget in network volumn so that you can use later.
gnarley_farley.
Sorry I don't want to hijack your thread but The issue is that you need to put everything in the docker image otherwise it takes to long to initialize in severless. I need it to be scalable from 0 serverless, because our sass infra demands that. It's quite complicated. Have tried. It might also be a bit premature I don't see any official/popular flux based workflows for inpainting up yet.
Example.Bot
Example.Bot5w ago
Yeah, I've been experimenting with my own serverless comfyui setup today and that was my experience as well, A flux docker image without extra quantization and whatnot takes forever to build and ends up at 40+ gigs but once it's set up the returned images go from 0 to loaded and generated within 20 seconds typically
SyedAliii
SyedAliii5w ago
@gnarley_farley. I was facing the same issue if use network volumn and put everything in it and then from docker image use network volumn then it is extremely very slow. If directly put in docker image then it's very fast but image size is very large. I don't see any solution to that right now. Though there are techniques to reduce docker image.
flash-singh
flash-singh5w ago
its best to use webhooks where possible, polling is inefficient in general we have model cache coming soon so you can pull models from huggingface and not embed them in your container image, we will automatically inject the model into your worker
Encyrption
Encyrption5w ago
This sounds hopeful, how does 'inject the model into your worker' actual happen? Is it done through a volume? I'm wondering how fast it will be.
gnarley_farley.
is there a serverless endpoint for cogvideo5b yet? I see camenduru has one with a gui. looking for just api service send request, receive back video . If anyone finds it please hola.
flash-singh
flash-singh4w ago
yes through a read only volume or folder, both container image and model are downloaded and stored into our nvme disk for local computing but also in network storage for caching and avoiding internet traffic in future
Encyrption
Encyrption4w ago
That sounds awesome! If it is on the NVME disk it should be just as fast as baking the model in without having to have large Docker images. As always, ❤️ your work!
gnarley_farley.
What's the latest version/implementaiton of the webhooks for runpod look like? Managed to find this link https://www.answeroverflow.com/m/1206251618022981694
webhooks custom updates - RunPod
Does the job webhook get invoked with runpod.serverless.progress_update calls?
Encyrption
Encyrption4w ago
I use runpod.serverless.progress_update to send updates in real time but it will NOT update the final status. For the final update you have to include all the data in what you return from the handler. You can either check that data with a STATUS or get the info from a webhook. <-- all of this is assuming you are using async RUN method.
gnarley_farley.
Yeah I am just using polling currently with RUN. Works flawlessly. Not really phased about a few extra requests popping of to check if it's ready. Hardly consequential in my current pipelines.
yhlong00000
yhlong000004w ago
Send a request | RunPod Documentation
Learn how to construct a JSON request body to send to your custom endpoint, including optional inputs for webhooks, execution policies, and S3-compatible storage, to optimize job execution and resource management.
alka_99
alka_992w ago
is this already released?? i've struggeled with docker image that has 49 GB size due to i need to put everything inside of the image itself and make sure it is installled also
flash-singh
flash-singh2w ago
not yet
gnarley_farley.
If you get the docker 5$ per month sub. (pro i think). You can automatically build the docker images when you push to github. Makes a huge difference.
nerdylive
nerdylive3d ago
You can do that with github actions too, or other ci /Cd pipelines provider But yeah those needs a setup with some tech skills, but its a one time setup unless something needs to change related to image name/ repo, etc
Want results from more Discord servers?
Add your server