Minimize the startup time of ComfyUI on serverless/pod GPU

Hello, Hope everybody is good. Thanks for this amazing community. I am currently facing an issue which is as follow: -> I am running ComfyUI on my local machine and on my local machine with my current workflow, models, and Loras loaded in around 15-30 seconds, and after that processing of the image starts which takes around 30 seconds. That is the desired or better result I am looking into on my paid Runpod serverless and pod GPU's -> But when I run the same setup with the same workflow in ComfyUI hosted on Runpod pod or serverless, the first time the models and Loras loading takes at least 50-120 seconds. The loading time is inconsistent, sometimes it takes around 60 seconds and sometimes takes more than 2 minutes. After that image processing takes around less than 10 seconds. Thus for the production apps, makes the API pretty useless because each worker takes its own 2 minutes. Note: I have tried to run the ComfyUI setup with the same Python, cuda, and Pytorch versions on both machines but the results are the same. =============== Specs ================== Local Machine: CPU: AMD Ryzen 7 5800H GPU: Nvidia RTX 3050T Ram: 40 GB Harddrive: Nvme Python: 3.11.7 Cuda: 12.4 Pytorch: 2.6.0.dev20240915+cu124 Pod Machine: GPU: Nvidia RTX 4090 (Tested on multiple GPUs, results are same) RAM: 48GB Harddrive: Nvme Python: 3.11.9 Cuda: 12.4 Pytorch: 2.6.0.dev20240915+cu124 ======================================= -> I try to run with flags --gpu-only, --highVRam but results are same. → Note: This is for the first time when you make the call. After the first call, I am able to achieve the desired results with a processing time of less than 10 seconds. I am looking for a technical guideline for this specific issue regarding how the distribution of GPUs to Pod/Serverless is done. I am available for a call. The Logs for both machines are attached. Thanks in Advance
9 Replies
haris
haris4mo ago
@SyedAliii post it in one place only, if someone knows how to help you they will respond when they are available.
SyedAliii
SyedAliiiOP4mo ago
I think bot automatically delete from serverless though I am facing same issue on both Pod and Serverless. But thanks for pointing out.
haris
haris4mo ago
I deleted the post in #⚡|serverless as you cross posted the message. Even though you are facing the issue on both platforms we ask that you only post it in one place.
riceboy26
riceboy264mo ago
Where are the Lora and base models loading from? Within the docker image or a network volume?
SyedAliii
SyedAliiiOP4mo ago
Lora, Models and Custom Nodes from network volumn.
riceboy26
riceboy264mo ago
That’s probably why. I keep seeing around the serverless channel that network volumes are really slow. (I’m anticipating this issue too) I’d suggest trying to fit the most common components(Loras, BLIP, and custom nodes) that are reasonably small directly in the docker image to help with loading time. Network volumes have to load data through the physical wire which is always going to be slower than if your docker image had it available on RAM or local disk already. The magnitude I saw other people have on loading times with network volume is 10-40 seconds. On my local Rtx 3070, loading a simple base 3-6GB model(like revanimated) takes under 3-6 seconds There’s also a “cache.py” file in runpod’s worker-a111 repo that you could take inspiration from that initializes models(interrogator/BLIP) by making sure they’re downloaded and loaded for subsequent requests, which really helped for my case (I’m using a111 still but there’s probably a similar thing you could do if ur using flux or comfyui)
SyedAliii
SyedAliiiOP4mo ago
Thanks. Will give it a try. @riceboy26 Thanks. Putting models, loras, custom nodes in docker image, solve my problem. Though docker image becomes very large
riceboy26
riceboy264mo ago
Nice! Yea the giant docker image is biggest drawback. It gets difficult to push bc it takes longer unless you have superb wifi lol Just curious, what’s the size of your docker image? You could also try a multi stage docker file to have an initial image download and install everything, and then only copy the files you care about for the runtime image which gets rid of all the cached and miscellaneous files to reduce your docker image size
SyedAliii
SyedAliiiOP4mo ago
21 gb 🥲 Thanks will try in future.
Want results from more Discord servers?
Add your server