RunPod•7mo ago

Flux.1 Schnell Serverless Speeds

What sort of speeds are people getting with their Flux.1 Schnell models using Serverless in RunPod? I'm currently hitting 30 seconds for 4 images with a significant amount of time moving the model to cuda (~15 seconds). Is there anyway to speed this up? (48GB GPU Pro)

3 Replies

Encyrption•7mo ago

You can load the models at startup instead of when responding to a request. This way it doesn't have to keep loading it from disk each time. This is NOT how most ComfyUI deployments are configured. To do this you may have to use a diffuser pipeline or similar instead to cache the model.

JohnDoeOP•7mo ago

I currently have a diffusers only pipeline so that would work. However, how do I load it on start up? Do I load it before it hits the inference function? As it seems to .to('cuda') part is where I'm coming unstuck

Encyrption•7mo ago

yes, If you load the models in either the global scope or the main scope of your handler.py (not in any functions) it should load prior to any inference and stay loaded.

Gaming

Programming

Flux.1 Schnell Serverless Speeds

Did you find this page helpful?