Unreasonably high start times on serverless workers
I'm trying to deploy a serverless endpoint for A1111 instances using a preconfigured network volume. I've followed the steps shown in this tutorial https://www.youtube.com/watch?v=gv6F9Vnd6io
But my workers seem to be running for multiple minutes with the container logs filled with the same message "Service not ready yet. Retrying..."
Am I missing something here?
Generative Labs
YouTube
Setting Up a Stable Diffusion API with Control Net using RunPod Ser...
In this step-by-step guide, we'll show you how to leverage the power of RunPod to create your own Stable Diffusion API with ControlNet enabled.
Here's what we'll cover in this tutorial:
Creating a Network Volume for robust model storage.
Installing Stable Diffusion and configuring it on the Network Volume.
Developing a Serverless Stable Diffus...
17 Replies
do u have a picture of ur template?
just wondering
There are many reasons actually a network volume can be slow, but the fact it isn't ready yet is indicating to me maybe something else
"Service not ready yet. Retrying..." that isn't yet related to network volumes
+ also share ur logs
Putting in the image is all i've done for setting up the template
Here's what the logs look like
Hmm that is very weird
I think for now just kill the request if not already def seems… hard to debug
maybe staff will know
Also ran the A1111 inside a pod to make sure thats not the problem
Maybe can try..
https://github.com/ashleykleynhans/runpod-worker-a1111
Ik this is pretty well documented… tho havent tried myself.
GitHub
GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Wor...
RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Worker for the Automatic1111 Stable Diffusion API
But either way
this is weird
prob staff will have better idea
I actually tried that first and had the same problem with high initialization times of around 90s
I see
did it work tho before? not getting stuck
It did work
I see
I think ur high initialization times then with ashelyks and (the currently pending unknown) generative labs for some reason is bc of network volumes
The main thing is network volumes is a separate hard drive so loading big models off diff hard drives can take a long time
So potentially to get faster speed: Build a custom docker file modifying ashelyk’s to just have a folder however it expects and download ur models into there :).
Can use a platform like depot to speed up the building:
https://discord.com/channels/912829806415085598/1194693049897463848
Building the dockerfile the way i do it is ask chatgpt how to add it by telling it the steps i took manually in a jupyter notebook / terminal
If i'm not using a network volume with a preinstalled A1111 in it, wont my image have to install A1111 and download every needed model on every worker before servicing a request? I was planning on using the --skip-install cmdline argument on a preinstalled A1111 to reduce load times for generations
Hey @Shaggbagg . I am working on the exact same problem as you. I started off with installing A1111 on a Network Volumes and noticed the cold start time are extremely high, between 60-100 secs. Then @justin recommended to install everything directly on a Docker container, and skipping the Network Volume altogether. I'm currently working on doing that right now, but running into some issues. I sent a friend request, maybe we can help each other since we're working on the same thing.
Can you help me figure out what the cooldown period actually refers to?
I assumed it was the time between finishing one request while having none pending and a new one coming in.
Looking at these requests and workers, even with a 60s cooldown time, the worker seems to die out before handling a request i send within 5 secs of getting the previous response, which leads me to believe they may be calculating the cooldown differently than what I expect
@justin
are you talking about delay time?
what is cooldown volume
*cool down period
Delay time is all the time before execution
meaning the time it sat in the queue
before it got picked up by a worker
execution times are when the worker is actually working on it
u aren't being charged for delay time, ur being charged for the time the worker is running + the time the worker is active but maybe not doing anything (which is configurable in the advance setting) + cold start time on the worker
Something i do for ex. is every time i get a request, i let the worker stay active for another 2 mins so it can immediately pick. up another request and avoid cold start
@Jack / @Shaggbagg
HMMMM. Im playing around with it too. Im in the process of trying to see if this dockerfile builds and imma load it up on the GPU Pod and play around with it for debugging sake