SyedAliii Posts - Answer Overflow

SyedAliii

•Created by SyedAliii on 9/20/2024 in #⚡｜serverless

Issue with Multiple instances of ComfyUI running simultaneously on Serverless

Hello, I am using Runpod Serverless and deploying ComfyUI using this repo: https://github.com/blib-la/runpod-worker-comfy?tab=readme-ov-file#bring-your-own-models For the Server, this repo is being used: https://github.com/comfyanonymous/ComfyUI I am deploying via docker image and both these repos are engrained into the image. When I run 2-3 workers via API, the comfy server gets activated and it responds as usual. The problem arose when multiple API requests came for example more than 5 requests came to workers and more than 5 workers got activated, in that case, the ComfyUI server creates an issue and does not get activate. I understand that activation of ComfyUI server is related to the comfyUI server code but if that is the case then even 1 worker shouldn't work but that is not the case. When workers are less, everything is working fine as soon number of workers increase then comfyUI server does not get's activated. I appreciate if anyone takes a look. Thank You

80 replies

RRunPod

•Created by SyedAliii on 9/16/2024 in #⛅｜pods-clusters

Minimize the startup time of ComfyUI on serverless/pod GPU

Hello, Hope everybody is good. Thanks for this amazing community. I am currently facing an issue which is as follow: -> I am running ComfyUI on my local machine and on my local machine with my current workflow, models, and Loras loaded in around 15-30 seconds, and after that processing of the image starts which takes around 30 seconds. That is the desired or better result I am looking into on my paid Runpod serverless and pod GPU's -> But when I run the same setup with the same workflow in ComfyUI hosted on Runpod pod or serverless, the first time the models and Loras loading takes at least 50-120 seconds. The loading time is inconsistent, sometimes it takes around 60 seconds and sometimes takes more than 2 minutes. After that image processing takes around less than 10 seconds. Thus for the production apps, makes the API pretty useless because each worker takes its own 2 minutes. Note: I have tried to run the ComfyUI setup with the same Python, cuda, and Pytorch versions on both machines but the results are the same. =============== Specs ================== Local Machine: CPU: AMD Ryzen 7 5800H GPU: Nvidia RTX 3050T Ram: 40 GB Harddrive: Nvme Python: 3.11.7 Cuda: 12.4 Pytorch: 2.6.0.dev20240915+cu124 Pod Machine: GPU: Nvidia RTX 4090 (Tested on multiple GPUs, results are same) RAM: 48GB Harddrive: Nvme Python: 3.11.9 Cuda: 12.4 Pytorch: 2.6.0.dev20240915+cu124 ======================================= -> I try to run with flags --gpu-only, --highVRam but results are same. → Note: This is for the first time when you make the call. After the first call, I am able to achieve the desired results with a processing time of less than 10 seconds. I am looking for a technical guideline for this specific issue regarding how the distribution of GPUs to Pod/Serverless is done. I am available for a call. The Logs for both machines are attached. Thanks in Advance

15 replies

Gaming

Programming