blistick
blistick
RRunPod
Created by blistick on 1/5/2024 in #⚡|serverless
What does "throttled" mean?
My endpoint dashboard sometimes shows "1 Throttled" worker, and 0 other workers, except for queued ones. What does the "throttled" status mean, and how do I prevent the condition?
12 replies
RRunPod
Created by blistick on 12/26/2023 in #⚡|serverless
Slow model loading
Hi all. I have a serverless endpoint designed to run Stable Diffusion inference. It's taking about 12 seconds to load the model (Realistic Vision) into the pipeline (using "StableDiffusionPipeline.from_pretrained") from a RunPod network drive. Is this normal? Is the load time mostly a function of (possibly slow) communications speed between the serverless instance and the network volume? The problem is that I'm loading other models as well, so even if I keep the endpoint active there is still a big delay before inference for a job can even begin, and then of course there's the time for inference itself. The total time is too long to provide a good customer experience. I love the idea of easy scaling using the serverless approach, and cost control, but if I can't improve the speed I may have to use a different approach. Any input on other people's experience and ways to improve model loading time would be greatly appreciated!
22 replies