RunPod•16mo ago

Unreasonably high start times on serverless workers

I'm trying to deploy a serverless endpoint for A1111 instances using a preconfigured network volume. I've followed the steps shown in this tutorial https://www.youtube.com/watch?v=gv6F9Vnd6io But my workers seem to be running for multiple minutes with the container logs filled with the same message "Service not ready yet. Retrying..." Am I missing something here?

Generative Labs

YouTube

Setting Up a Stable Diffusion API with Control Net using RunPod Ser...

In this step-by-step guide, we'll show you how to leverage the power of RunPod to create your own Stable Diffusion API with ControlNet enabled. Here's what we'll cover in this tutorial: Creating a Network Volume for robust model storage. Installing Stable Diffusion and configuring it on the Network Volume. Developing a Serverless Stable Diffus...

17 Replies

J.•16mo ago

do u have a picture of ur template? just wondering There are many reasons actually a network volume can be slow, but the fact it isn't ready yet is indicating to me maybe something else "Service not ready yet. Retrying..." that isn't yet related to network volumes + also share ur logs

ShaggbaggOP•16mo ago

Putting in the image is all i've done for setting up the template

ShaggbaggOP•16mo ago

Here's what the logs look like

J.•16mo ago

Hmm that is very weird I think for now just kill the request if not already def seems… hard to debug maybe staff will know

ShaggbaggOP•16mo ago

Also ran the A1111 inside a pod to make sure thats not the problem

J.•16mo ago

Maybe can try.. https://github.com/ashleykleynhans/runpod-worker-a1111 Ik this is pretty well documented… tho havent tried myself.

GitHub

GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Wor...

RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - GitHub - ashleykleynhans/runpod-worker-a1111: RunPod Serverless Worker for the Automatic1111 Stable Diffusion API

J.•16mo ago

But either way this is weird prob staff will have better idea

ShaggbaggOP•16mo ago

I actually tried that first and had the same problem with high initialization times of around 90s

J.•16mo ago

I see did it work tho before? not getting stuck

ShaggbaggOP•16mo ago

It did work

J.•16mo ago

I see I think ur high initialization times then with ashelyks and (the currently pending unknown) generative labs for some reason is bc of network volumes The main thing is network volumes is a separate hard drive so loading big models off diff hard drives can take a long time So potentially to get faster speed: Build a custom docker file modifying ashelyk’s to just have a folder however it expects and download ur models into there :). Can use a platform like depot to speed up the building: https://discord.com/channels/912829806415085598/1194693049897463848 Building the dockerfile the way i do it is ask chatgpt how to add it by telling it the steps i took manually in a jupyter notebook / terminal

ShaggbaggOP•16mo ago

If i'm not using a network volume with a preinstalled A1111 in it, wont my image have to install A1111 and download every needed model on every worker before servicing a request? I was planning on using the --skip-install cmdline argument on a preinstalled A1111 to reduce load times for generations

Jack•16mo ago

Hey @Shaggbagg . I am working on the exact same problem as you. I started off with installing A1111 on a Network Volumes and noticed the cold start time are extremely high, between 60-100 secs. Then @justin recommended to install everything directly on a Docker container, and skipping the Network Volume altogether. I'm currently working on doing that right now, but running into some issues. I sent a friend request, maybe we can help each other since we're working on the same thing.

ShaggbaggOP•16mo ago

Can you help me figure out what the cooldown period actually refers to? I assumed it was the time between finishing one request while having none pending and a new one coming in. Looking at these requests and workers, even with a 60s cooldown time, the worker seems to die out before handling a request i send within 5 secs of getting the previous response, which leads me to believe they may be calculating the cooldown differently than what I expect

ShaggbaggOP•16mo ago

@justin

J.•16mo ago

are you talking about delay time? what is cooldown volume *cool down period Delay time is all the time before execution meaning the time it sat in the queue before it got picked up by a worker execution times are when the worker is actually working on it u aren't being charged for delay time, ur being charged for the time the worker is running + the time the worker is active but maybe not doing anything (which is configurable in the advance setting) + cold start time on the worker Something i do for ex. is every time i get a request, i let the worker stay active for another 2 mins so it can immediately pick. up another request and avoid cold start @Jack / @Shaggbagg HMMMM. Im playing around with it too. Im in the process of trying to see if this dockerfile builds and imma load it up on the GPU Pod and play around with it for debugging sake

Gaming

Programming

Unreasonably high start times on serverless workers

Did you find this page helpful?