Blanchon.jl Posts - Answer Overflow

Blanchon.jl

•Created by Blanchon.jl on 1/2/2024 in #⚡｜serverless

Best practices

Hey 👋🏻 I have a few questions regarding Runpod serverless, specifically related to image generation tasks (like Stable Diffusion). 1. Storage - NVME vs Network Volume: I've read in some posts that storing models directly in the docker container is more cost and speed-efficient compared to using a network volume. For tasks involving various Stable Diffusion templates, does this mean all templates must be stored within the Docker container? 2. Instance Warm-Up: Is there a way to pre-warm an instance based on a custom event from my side? For instance, if a user logs into my platform, and there's a high likelihood they'll initiate a computation, can I pre-start a worker for them? What would be the best approach to do this? Maybe a dummy call to the handler to activate a worker without triggering any calculations? 3. Intermediate/Real-Time Results: What's the best method for sending interim results to the client in image generation tasks? For language models, this is typically done using a 'yield' generator, but I'm unsure if this applies to image tasks like real-time Stable Diffusion. Is there something like runpod.serverless.progress_update for this purpose? How would this function on the client side? 4. Worker Consistency: I plan to use a single template for about 20 different jobs that share a common base but load different models into VRAM. If a client uses a real-time Stable Diffusion task and a worker loads the necessary models into VRAM for that task, can the subsequent use of this task be linked to the same worker? For instance, if a client makes a request, the worker processes it and then a second request follows shortly after. Is there a way to ensure this second request goes to the same worker to avoid reloading models into VRAM? Thanks a lot for your help. Any level of detail in your response is appreciated ❤️

24 replies

Gaming

Programming