SyedAliii
RRunPod
•Created by SyedAliii on 9/20/2024 in #⚡|serverless
Issue with Multiple instances of ComfyUI running simultaneously on Serverless
Hello,
I am using Runpod Serverless and deploying ComfyUI using this repo: https://github.com/blib-la/runpod-worker-comfy?tab=readme-ov-file#bring-your-own-models
For the Server, this repo is being used: https://github.com/comfyanonymous/ComfyUI
I am deploying via docker image and both these repos are engrained into the image.
When I run 2-3 workers via API, the comfy server gets activated and it responds as usual.
The problem arose when multiple API requests came for example more than 5 requests came to workers and more than 5 workers got activated, in that case, the ComfyUI server creates an issue and does not get activate.
I understand that activation of ComfyUI server is related to the comfyUI server code but if that is the case then even 1 worker shouldn't work but that is not the case. When workers are less, everything is working fine as soon number of workers increase then comfyUI server does not get's activated.
I appreciate if anyone takes a look.
Thank You
80 replies
Minimize the startup time of ComfyUI on serverless/pod GPU
Hello,
Hope everybody is good. Thanks for this amazing community.
I am currently facing an issue which is as follow:
-> I am running ComfyUI on my local machine and on my local machine with my current workflow, models, and Loras loaded in around 15-30 seconds, and after that processing of the image starts which takes around 30 seconds. That is the desired or better result I am looking into on my paid Runpod serverless and pod GPU's
-> But when I run the same setup with the same workflow in ComfyUI hosted on Runpod pod or serverless, the first time the models and Loras loading takes at least 50-120 seconds. The loading time is inconsistent, sometimes it takes around 60 seconds and sometimes takes more than 2 minutes. After that image processing takes around less than 10 seconds. Thus for the production apps, makes the API pretty useless because each worker takes its own 2 minutes.
Note: I have tried to run the ComfyUI setup with the same Python, cuda, and Pytorch versions on both machines but the results are the same.
=============== Specs ==================
Local Machine:
CPU: AMD Ryzen 7 5800H
GPU: Nvidia RTX 3050T
Ram: 40 GB
Harddrive: Nvme
Python: 3.11.7
Cuda: 12.4
Pytorch: 2.6.0.dev20240915+cu124
Pod Machine:
GPU: Nvidia RTX 4090 (Tested on multiple GPUs, results are same)
RAM: 48GB
Harddrive: Nvme
Python: 3.11.9
Cuda: 12.4
Pytorch: 2.6.0.dev20240915+cu124
=======================================
-> I try to run with flags --gpu-only, --highVRam but results are same.
→ Note: This is for the first time when you make the call. After the first call, I am able to achieve the desired results with a processing time of less than 10 seconds.
I am looking for a technical guideline for this specific issue regarding how the distribution of GPUs to Pod/Serverless is done. I am available for a call.
The Logs for both machines are attached.
Thanks in Advance
15 replies