Blanchon.jl
RRunPod
•Created by Blanchon.jl on 1/2/2024 in #⚡|serverless
Best practices
Hey 👋🏻
I have a few questions regarding Runpod serverless, specifically related to image generation tasks (like Stable Diffusion).
1. Storage - NVME vs Network Volume: I've read in some posts that storing models directly in the docker container is more cost and speed-efficient compared to using a network volume. For tasks involving various Stable Diffusion templates, does this mean all templates must be stored within the Docker container?
2. Instance Warm-Up: Is there a way to pre-warm an instance based on a custom event from my side? For instance, if a user logs into my platform, and there's a high likelihood they'll initiate a computation, can I pre-start a worker for them? What would be the best approach to do this? Maybe a dummy call to the handler to activate a worker without triggering any calculations?
3. Intermediate/Real-Time Results: What's the best method for sending interim results to the client in image generation tasks? For language models, this is typically done using a 'yield' generator, but I'm unsure if this applies to image tasks like real-time Stable Diffusion. Is there something like runpod.serverless.progress_update for this purpose? How would this function on the client side?
4. Worker Consistency: I plan to use a single template for about 20 different jobs that share a common base but load different models into VRAM. If a client uses a real-time Stable Diffusion task and a worker loads the necessary models into VRAM for that task, can the subsequent use of this task be linked to the same worker? For instance, if a client makes a request, the worker processes it and then a second request follows shortly after. Is there a way to ensure this second request goes to the same worker to avoid reloading models into VRAM?
Thanks a lot for your help. Any level of detail in your response is appreciated ❤️
24 replies