How to set up runpod-worker-comfy with custom nodes and models
hi i managed to set up a serverless api using the SD image example template github.com/blib-la, but what if i have my own comfy workflow that uses custom nodes and models? How do i make a docker image for that so I can use that as the template? I want to use a network drive ideally but when i use the base template timpietruskyblibla/runpod-worker-comfy:3.1.0-base and try to start a serverless endpoint connected to a network drive i previously downloaded the nodes/models to, they aren't there
142 Replies
Are you installing your models into /runpod-volume ? That is where your network volume is attached. If not, you may have to create symbolic links to where you are storing the models to somewhere in /runpod-volume. I suggest you check out https://github.com/blib-la/runpod-worker-comfy?tab=readme-ov-file#network-volume for a specific example. This works with blib-la/runpod-worker-comfy
GitHub
GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API o...
ComfyUI as a serverless API on RunPod. Contribute to blib-la/runpod-worker-comfy development by creating an account on GitHub.
so i created a network drive and then i started a pod on a secure instance, now in jupyter i try to download the custom model i need
That looks fine... as long as the template you are using on your serverless worker knows to look in /runpod-volume/ComfyUI/models/checkpoints for it's checkpoints it should work. If it is not configured to do so, it will not work.
that installed successfully and then i installed a bunch of other custom nodes
so now, i plan to terminate this pod
and then install the base template from timpietruskyblibla/runpod-worker-comfy:3.1.0-base
ie i am going to delete this, which i just have
i just created this template
I do not know the specifics of timpietruskyblibla/runpod-worker-comfy:3.1.0-base. As long as that image knows to look in the right path for models it should work.
then i used that template to do my end point
in the repo i see this
That would suggest that where you should put the model would be at
From your previous screenshot it looks like you put it in
If that is what has happened then you may need to move them into the proper directory.
oh
wut
so where should i save custom nodes?
/runpod-volume/custom_nodes
You're guess is as good as mine... that sounds right though.
ok the model thing worked, placing it there allowed me to access it, thanks for that
but the custom node thing didnt work, i put them all in /runpod-volume/custom_nodes
Yeah, since it was not in the list I was worried that might happen. Suggest if you have access to the source code for this image that you look at that to see where it is looking for the custom_nodes at.
erm, i dont suppose u could point me the way as to where i could find it in the repo do u
Where did you get the image?
GitHub
GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API o...
ComfyUI as a serverless API on RunPod. Contribute to blib-la/runpod-worker-comfy development by creating an account on GitHub.
the base template timpietruskyblibla/runpod-worker-comfy:3.1.0-bas
Oh that is the same image I am using... although I am not using network volume. Let me check source.
When you bake the models into the image the custom nodes end up here:
Still not sure how that maps to network volume
so u are saying i shd modify the dockefile and make my own image?
That's up to you. That is the route I have taken with that same Github. Personally I never trust an image that I did not build. To easy for people to pull images off repo plus cannot reference the source.
It seems they may have not accounted for custom nodes using network volume when building that image.
i just tried to install the nodes in a new image, didnt really work...
If you want to try and build it from source you can try adding this in the global scope:
This will link /comfyui/custom_nodes to your network volume.
oh ok, erm im kind of new to this, where is the global scope
At the very top, just below the other import, not inside a function.
In Python the code is format dependent, global scope is any command that has no spaces in front of it, no indentation.
do u mean the Dockerfile?
or the rp_handler.py
Uh no, it is src/rp_handler.py
ok
i also have a qn about the dockerfile
What is this test_input.json for? Shd i replace that with the actual api.json of my workfow?
test_input.json is used when testing your worker locally. It won't have any impact when running on RunPod.
oh
ok
I removed 'test_input.json' from that line in mine.
thanks i will
i will try to add that line and then make my own image
docker build -t <your_dockerhub_username>/runpod-worker-comfy:dev-base --target base --platform linux/amd64 .
ie this with my own username
This is how I test locally:
This passes a local file an maps it as test_input.json internally.
oh
yeah that build command looks fine, I would use a version # for tag, like runpod-worker-comfy:v1 this way as you make updates you can increase it i.e. comfy:v2, comfy:v3 etc. Doing this helps keep your template straight when you add a new version.. You can see in my example above I am at comfy:v21 LOL I have been busy.
ah ok
hopefully you do not need to make so many modifcations... LOL
In my version, I baked in flux schnell, flux dev, sd3, and sdxl into the same image. Makes my image 81.5GB!
holy,...on my bad internet its taking ages to push this base 7 gig image to my docker
also , so in theory if my customs nodes are there, i just have to submit my json of that workflow (api version) in my request right
Good luck!
If you need to push any future versions it will be cached and the push to docker hub will take like 10 seconds.
oh that is a relief
You can look at that test_input.json as an example of how to submit a workflow, obviously you will have to modify for your specific workflow.
man,.. i was deleting my docker repo and repushing 7gig every time...
always leave at least your last version so that cache kicks in
docker build -t <your_dockerhub_username>/runpod-worker-comfy:v2 --target base --platform linux/amd64 .
so next time i do this, and push it will be faster right
yes, should be much faster... the build will likely take longer than the push does.
i have a qn, im trying too see if my end point works but cause its an img2img workflwo im wondering if i have to change my api.json
right now it says this
"input": {
"workflow": {
"1": {
"inputs": {
"image": encoded_string,
"upload": "image"
},
"class_type": "LoadImage",
"_meta": {
"title": "Load Image"
}
},
where in my test script i convert my image to base64
Load and encode the image
image_path = r"C:\Users\test\Desktop\pic.jpg"
with open(image_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
is that right?
Does the node want a base64 encoded image?
yes
I generally do not use base64 encoded images because you are limited to 10MB size of your JSON. Instead I tend to use s3 buckets and pass in the URL.
go with what you think should work and then check for errors after that...
So, if it does work you might want to change it so that you provide URL and the handler converts it into base64. This way you get around the 10MB limit.
so i should do this?
Request payload
payload = {
"input": {
"workflow": {
"1": {
"inputs": {
"image": image_url,
"upload": "url"
},
},
Depends if node 1 will accept a URL. If it does then yes.
if not you could add another node that converts the URL into base64 and then gives it to node 1
What I have done with my custom version is simplified the input the handler requires and then once I have that simplified version I then dynamically build the workflow inside the worker. For example, this is the input I send my worker:
and then I create workflow like this:
oh
now im just tryign to submit the entire flow to see if my endpoint works though that ssomething i shd look into if it works
Another thing to think about once you get it working... If you add BUCKET_ENDPOINT_URL, BUCKET_ACCESS_KEY_ID, BUCKET_SECRET_ACCESS_KEY to your environmental variables it will return the images as a URL rather than base64.
im trying to install the custom nodes via updating my docker image
though it is taking some time to build
i guess it is downloading all of them locally?
yeah
just now i tried to push my updated build which did using
docker build -t username/runpod-worker-comfy:v2 --target base --platform linux/amd64 .
with
docker push username/runpod-worker-comfy:v2
but that seemed tp push the whole thing again??
That's odd was the last one you built :v1? and was it still in the repo?
it was tagged dev-base
its still there
that's very odd, I would expect for the cache to kick in. It takes me like 5 seconds to upload that 80+ gb image.
hmm say could u confirm with methis is how i clone an image to my own docker?
1) Git clone https://github.com/blib-la/runpod-worker-comfy.git
2) now i cd run-worker-comfy
3) then docker build --build-arg MODEL_TYPE=sdxl -t <your_dockerhub_username>/runpod-worker-comfy:dev-sdxl --platform linux/amd64 . (i want the sdxl one)
4) then docker push <your_dockerhub_username>/runpod-worker-comfy:dev-sdxl
that's it right?
GitHub
GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API o...
ComfyUI as a serverless API on RunPod. Contribute to blib-la/runpod-worker-comfy development by creating an account on GitHub.
That looks right to me
but when i try to use this image i cloned as a template to launch an end point i see this in the log
2024-09-19T11:25:46.814681415Z This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2024-09-19T11:25:46.814685756Z By pulling and using the container, you accept the terms and conditions of this license:
2024-09-19T11:25:46.814689559Z https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2024-09-19T11:25:46.814693296Z
2024-09-19T11:25:46.814712295Z A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2024-09-19T11:25:46.832697135Z
2024-09-19T11:25:46.835713814Z /usr/bin/env: 'bash\r': No such file or directory
Can you show me your Dockerfile?
Seems like you are calling /usr/bin/env somewhere(Dockerfile or rp_handler.py). This is not how you should set ENV variables. Instead, you should use the RunPod GUI(see screenshot) to edit the template and add your ENV variables there.
hi ok let me show u
this was the file
its the unchanged one from the cloned repo
i dont have an s3 bucket at alll
i didnt set any env variables at all
when i use the image from that bil-ba repo to make a template and launch an endpoint, it works, no problem
im doing the exact same thing with my own cloned image but i get that error
Somewhere in your code it is calling /usr/bin/env this won't work because that is a bash shell command. I don't see it in your Dockerfile. Did you modify your start.sh script?
this was the file
I just ran the error you got by ChatGPT. Here is its response:
So, it looks that file has been corrupted by someone opening and saving it in windows notepad or similar. I suggest trying the following:
That should strip away any of the incorrect windows characters. This assumes you are running this on Linux. You may have to install with:
Unfourtunately you will have to rebuild the image after this.
where do i enter dos2unix start.sh
i use windows
If you can DM your email address, I will fix the file from the repo and email it to you.
I can prob send through here if i zip it first, standby
Here you go, unzip and use this file (without editing it in windows)
ok
i just replace this and then i build again right
yeah, unzip that file and you will get start.sh use that to replace your corrupted start.sh and then build again.
ok i am trying it
actually that start file looks identical to what was already in the repo
whoah hang on, that worked
my image was able to generate
something
that start file looks like the original though, whats the difference
ok so i wanted to install these custom nodes in my docker image, i can just add these lines?
What is different about that file relates to how Linux & Windows handle carriage returns in text files. In Linux it uses 1 character (\n) but window uses two (\r\n). The first, an possibly other, lines in your start.sh had a \r\n and because of that Linux couldn't parse it.
oh
hmm
I think your code for adding custom nodes should work, give it a try!
i added those custom node clone lines into my docker image and pushed again but now i am having issues submitting an image to my workflow
i downloaded the api json from my flow, its an imgtoimg, i dont suppose u could check out my test script...
ok my request to my api failed and looking at the log, im tinking though i downloaded the custom nodes, i still have to download further models into /models (not models/checkpoints)?
i looked at my flow and I see i need to get these also?
You will need to make sure you add all models, loras, etc you need in your workflow to your network volume.
actually now that i think of my docker file, shdi have entered soemething to install the requirements txt for florence2 and SAm2 models?
requirements.txt holds a list of all the python modules your image needs.
i asked gpt and they told me to add this to my docker
Install custom node dependencies
RUN for dir in custom_nodes/*; do \
if [ -f "$dir/requirements.txt" ]; then \
pip3 install --upgrade -r "$dir/requirements.txt"; \
fi; \
done
this makes sense right
I'm not sure... I have never installed any
oh
I likely cannot use ComfyUI going forward... It loads the models in the handler function so each time a request runs it has to load the models from disk... This adds 30 - 50 seconds to each request. I am going to start looking into using diffusion pipelines.
The "Install custom node dependencies" code you pasted makes sense though, the custom nodes are made of python so I can see why they would want to do a pip install for each one to make sure it has the required python modules to run.
yeah i am trying it now to see
yeah i think mine is going to be slow with comfy
You will also receive a delay for using network volume... for fastest results you should bake your model into your image, similar to what you are doing with the custom nodes. Although, RunPod has hinted at future update where they run a model cache for the users. Though, not much detail has been provided at this point.
oh i thought u work for runpod
yeah this is tricky,im stil struggling to even generate an image, i guess i can work on optimization after i actually get one
@Encyrption Depending on what you're doing, you may find it easier to change the way Comfy loads models than to rework your entire solution.
You can keep Comfy warm and it does some auto caching as long as you have an endpoint for each type of workload defined.
Can you point me to an example of how to keep Comfy warm?
Seems you are missing some ENV variables for some reason.
apt-get install libgl1
Also, seems for one module it needs libGL.so.1. Add this line to the top of your Dockerfile to add it.
oh ok i will try that
ERROR: failed to solve: dockerfile parse error on line 1: unknown instruction: apt-get
i got this adding this to my dockerfile...
My bad, it should be:
also, it needs to come AFTER the FROM line
ok thx
u know i added that line and then i built my image aain and it takes a long time like its doing everything all over again, is that normal? i didnt delete any prior versions
docker build -t <your_dockerhub_username>/runpod-worker-comfy:v1 --target base --platform linux/amd64 .
i just go v2 v3 etc each time
Yeah... when you modify the Dockerfile you throw the layers out of whack and might be more like a full build.
oh ok
@Encyrption I'm assuming you're already keeping the Comfy server running right? Comfy has some model management built in, but there are also custom nodes that 'might' help.
As long as the server is running and you don't change the model, successive runs should be MUCH faster. You can see this just by restarting a server, loading comfy, running a workflow, changing an input and running it again.
If you're booting Comfy every time you process a new workload it'll be rough.
There's also a PR that was never included which lets you cache multiple models so that changing between them doesn't reload them.
GitHub
add model cache after loaded by efwfe · Pull Request #3605 · comfya...
Hello, I made a PR to cache model after loaded. save the model instance in memory, that's really will cost some memory here, but dont worry, it will check the memory free both cpu and gpu, ...
I haven't tried it yet but theres also this node:
https://github.com/willblaschko/ComfyUI-Unload-Models
GitHub
GitHub - willblaschko/ComfyUI-Unload-Models: Gives the option to un...
Gives the option to unload one or all models based on memory needs in your flow. - willblaschko/ComfyUI-Unload-Models
The original idea for using comfy was the ability to run multiple models under a single endpoint. I was able to bake in flux schnell, flux dev, sd3 and sdxl but I am passing the model name to load in with the JSON job. So, it has no choice but to load the model during the request.
With enough vram maybe you could pull it off, but that's not really a Comfy problem in either case right? I'd just create multiple endpoints.
say ...tried my image with that new line
2024-09-20 01:56:16.173[a19eipwlrn36eb][info]Finished.
2024-09-20 01:56:15.995[a19eipwlrn36eb][info]invalid prompt: {'type': 'invalid_prompt', 'message': 'Cannot execute because node LayerMask: MaskPreview does not exist.', 'details': "Node ID '#19'", 'extra_info': {}}\n
2024-09-20 01:56:15.994[a19eipwlrn36eb][info]got prompt\n
2024-09-20 01:56:15.993[a19eipwlrn36eb][info]runpod-worker-comfy - image(s) upload complete\n
2024-09-20 01:56:15.985[a19eipwlrn36eb][info]runpod-worker-comfy - image(s) upload\n
2024-09-20 01:56:15.985[a19eipwlrn36eb][info]runpod-worker-comfy - API is reachable\n
this node 19 is in my workfow , and it runs on my browser comfy ok, any ideas?
Looks like you are not providing node 19 with all the inputs it requires.
erg...im lost, i dont get it, i downloaded my api json, did the docker with all the custom nodes and got all the models , i would assume that would be it...
Can you show me the portion of the workflow JSON regarding node 19?
sure here
"18": {
"inputs": {
"model": "sam2_hiera_small.safetensors",
"segmentor": "single_image",
"device": "cuda",
"precision": "bf16"
},
"class_type": "DownloadAndLoadSAM2Model",
"_meta": {
"title": "(Down)Load SAM2Model"
}
},
"19": {
"inputs": {
"mask": [
"17",
0
]
},
"class_type": "LayerMask: MaskPreview",
"_meta": {
"title": "LayerMask: MaskPreview"
}
},
"20": {
"inputs": {
"mask1": [
"7",
0
],
"mask2": [
"17",
0
]
},
"class_type": "SubtractMask",
"_meta": {
"title": "Pixelwise(MASK - MASK)"
}
},
"21": {
"inputs": {
"erode_dilate": 0,
"fill_holes": 10,
"remove_isolated_pixels": 10,
"smooth": 0,
"blur": 0,
"mask": [
"20",
0
]
},
"class_type": "MaskFix+",
"_meta": {
"title": "🔧 Mask Fix"
}
},
"22": {
"inputs": {
"expand": 10,
"tapered_corners": false,
"mask": [
"21",
0
]
},
"class_type": "GrowMask",
"_meta": {
"title": "GrowMask"
}
},
Disable your preview nodes when you save out the API format.
Yeah, what @teddycatsdomino said!
ohok
ok i disabled all the previews and saved and i sent it in and i get this...
2024-09-20 02:37:28.800[a19eipwlrn36eb][info]invalid prompt: {'type': 'invalid_prompt', 'message': 'Cannot execute because node SubtractMask does not exist.', 'details': "Node ID '#20'", 'extra_info': {}}\n
2024-09-20 02:37:28.800[a19eipwlrn36eb][info]got prompt\n
2024-09-20 02:37:28.799[a19eipwlrn36eb][info]runpod-worker-comfy - image(s) upload complete\n
2024-09-20 02:37:28.795[a19eipwlrn36eb][info]runpod-worker-comfy - image(s) upload\n
2024-09-20 02:37:28.794[a19eipwlrn36eb][info]runpod-worker-comfy - API is reachable\n
Make sure you've installed all your custom nodes.
I think you missed the ComfyUI Impact Pack
Clone custom nodes with retry mechanism
RUN for i in {1..3}; do \
git clone https://github.com/ltdrdata/ComfyUI-Manager.git custom_nodes/ComfyUI-Manager && \
git clone https://github.com/cubiq/ComfyUI_essentials.git custom_nodes/ComfyUI_essentials && \
git clone https://github.com/rgthree/rgthree-comfy.git custom_nodes/rgthree-comfy && \
git clone https://github.com/city96/ComfyUI-GGUF.git custom_nodes/ComfyUI-GGUF && \
git clone https://github.com/ltdrdata/ComfyUI-Impact-Pack.git custom_nodes/ComfyUI-Impact-Pack && \
git clone https://github.com/chrisgoringe/cg-use-everywhere.git custom_nodes/cg-use-everywhere && \
git clone https://github.com/kijai/ComfyUI-Florence2.git custom_nodes/ComfyUI-Florence2 && \
git clone https://github.com/kijai/ComfyUI-segment-anything-2.git custom_nodes/ComfyUI-segment-anything-2 && \
git clone https://github.com/kijai/ComfyUI-FluxTrainer.git custom_nodes/ComfyUI-FluxTrainer && \
git clone https://github.com/chflame163/ComfyUI_LayerStyle.git custom_nodes/ComfyUI_LayerStyle && \
break || sleep 15; \
done
Install ComfyUI dependencies
RUN pip3 install --upgrade --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 \
&& pip3 install --upgrade -r requirements.txt
this was in my docker though , the impact pack was there
GitHub
GitHub - ltdrdata/ComfyUI-Manager: ComfyUI-Manager is an extension ...
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, th...
GitHub
GitHub - cubiq/ComfyUI_essentials
Contribute to cubiq/ComfyUI_essentials development by creating an account on GitHub.
GitHub
GitHub - rgthree/rgthree-comfy: Making ComfyUI more comfortable!
Making ComfyUI more comfortable! Contribute to rgthree/rgthree-comfy development by creating an account on GitHub.
GitHub
GitHub - city96/ComfyUI-GGUF: GGUF Quantization support for native ...
GGUF Quantization support for native ComfyUI models - city96/ComfyUI-GGUF
GitHub
GitHub - ltdrdata/ComfyUI-Impact-Pack: Custom nodes pack for ComfyU...
Custom nodes pack for ComfyUI This custom node helps to conveniently enhance images through Detector, Detailer, Upscaler, Pipe, and more. - ltdrdata/ComfyUI-Impact-Pack
You can see the log when Comfy starts up, it should list the installed custom nodes, how long it to import them, and whether or not any failed. Something like this:
oh
You might be missing a dependency for the Impact Pack.
this was my log i got some weird warnings
2024-09-20 01:54:17.988 | info | a19eipwlrn36eb | 87.0 seconds (IMPORT FAILED): /comfyui/custom_nodes/ComfyUI-Impact-Pack\n
2024-09-20 01:54:17.988 | info | a19eipwlrn36eb | 0.6 seconds: /comfyui/custom_nodes/ComfyUI-Florence2\n
2024-09-20 01:54:17.988 | info | a19eipwlrn36eb | 0.3 seconds (IMPORT FAILED): /comfyui/custom_nodes/ComfyUI-FluxTrainer\n
2
it looks like impact failed? how can i repair it
i restarted a new endppoint and my custom nodes failed to import, the same ones
The module ComfyUI_LayerStyle is failing to import due to the missing shared object file libgthread-2.0.so.0
I suggest you update that RUN command you added to the Dockerfile to this:
oh ok
thx
Ok, try this:
thx
u kno wii think my docker image is just not isntalling right
i tried to install somethin on modal and this appeared to work
how can i do this in docket?
I need to check out Modal.
yeah have u used it? i mean it seems easier than runpod
like i was able to at least installa comfy instance that worked thogh submitting that same workflow as an api didnt...
I have not used it... I have not seen a good enough example of how to use it. Seems to be very poorly documented. Have you found any good documentation?
Modal
Run Flux on ComfyUI interactively and as an API
ComfyUI is an open-source Stable Diffusion GUI with a graph/nodes based interface that allows you to design and execute advanced image generation pipelines.
that appeared to at least install all my custom nodes etc right