R
RunPod3mo ago
Hello

Speeding up loading of model weights

Hi guys, I have setup my severless docker image to contain all my required model weights. My handler script also loads the weights using the diffusers library's .from_pretrained with local_files_only=true so we are loading everything locally. I notice that during cold starts, loading those weights still take around 25 seconds till the logs display --- Starting Serverless Worker | Version 1.6.2 ---. Anyone has experience optimising the time needed tp load weights? Could we pre-load it on ram or something (I may be totally off)?
4 Replies
Encyrption
Encyrption3mo ago
Are the models built into your image or stored on a network volume? Also, where in your code are you loading the models from disk? I.e. From global scope / main, in the handler?
Hello
HelloOP3mo ago
1. The weights are built into the image 2. It is defined in the global scope, outside of the main handler function that runpod.serverless.start calls Since I require multiple models, I'm not sure what other optimizations / good practices are there so I'm asking here haha. Thanks for such a quick response though!
Encyrption
Encyrption3mo ago
Loading the models up in the global scope will increase the cold startup time but it will speed up subsequent requests dramatically. This will come into play with active servers and FlashBoot. FlashBoot will come into play when a worker finishes with a request and it has another request waiting for it. So, the more workers and traffic to your endpoints the more efficient FlashBoot will become. Active servers run all the time so after the 1st query they will respond quickly, regardless of you traffic. It's a balancing act to figure out what is best number of active and max workers for your traffic
Madiator2011
Madiator20113mo ago
load outside of job function
Want results from more Discord servers?
Add your server