Flux.1 Schnell Serverless Speeds
What sort of speeds are people getting with their Flux.1 Schnell models using Serverless in RunPod? I'm currently hitting 30 seconds for 4 images with a significant amount of time moving the model to cuda (~15 seconds). Is there anyway to speed this up? (48GB GPU Pro)
3 Replies
You can load the models at startup instead of when responding to a request. This way it doesn't have to keep loading it from disk each time. This is NOT how most ComfyUI deployments are configured. To do this you may have to use a diffuser pipeline or similar instead to cache the model.
I currently have a diffusers only pipeline so that would work. However, how do I load it on start up? Do I load it before it hits the inference function? As it seems to .to('cuda') part is where I'm coming unstuck
yes, If you load the models in either the global scope or the main scope of your handler.py (not in any functions) it should load prior to any inference and stay loaded.