RunPod•8mo ago

Speeding up loading of model weights

Hi guys, I have setup my severless docker image to contain all my required model weights. My handler script also loads the weights using the diffusers library's .from_pretrained with local_files_only=true so we are loading everything locally. I notice that during cold starts, loading those weights still take around 25 seconds till the logs display --- Starting Serverless Worker | Version 1.6.2 ---. Anyone has experience optimising the time needed tp load weights? Could we pre-load it on ram or something (I may be totally off)?

4 Replies

Encyrption•8mo ago

Are the models built into your image or stored on a network volume? Also, where in your code are you loading the models from disk? I.e. From global scope / main, in the handler?

HelloOP•8mo ago

1. The weights are built into the image 2. It is defined in the global scope, outside of the main handler function that runpod.serverless.start calls Since I require multiple models, I'm not sure what other optimizations / good practices are there so I'm asking here haha. Thanks for such a quick response though!

Encyrption•8mo ago

Loading the models up in the global scope will increase the cold startup time but it will speed up subsequent requests dramatically. This will come into play with active servers and FlashBoot. FlashBoot will come into play when a worker finishes with a request and it has another request waiting for it. So, the more workers and traffic to your endpoints the more efficient FlashBoot will become. Active servers run all the time so after the 1st query they will respond quickly, regardless of you traffic. It's a balancing act to figure out what is best number of active and max workers for your traffic

Madiator2011•8mo ago

load outside of job function

Gaming

Programming

Speeding up loading of model weights

Did you find this page helpful?