Speeding up loading of model weights
Hi guys, I have setup my severless docker image to contain all my required model weights. My handler script also loads the weights using the diffusers library's
.from_pretrained
with local_files_only=true
so we are loading everything locally. I notice that during cold starts, loading those weights still take around 25 seconds till the logs display --- Starting Serverless Worker | Version 1.6.2 ---
.
Anyone has experience optimising the time needed tp load weights? Could we pre-load it on ram or something (I may be totally off)?4 Replies
Are the models built into your image or stored on a network volume?
Also, where in your code are you loading the models from disk? I.e. From global scope / main, in the handler?
1. The weights are built into the image
2. It is defined in the global scope, outside of the main
handler
function that runpod.serverless.start
calls
Since I require multiple models, I'm not sure what other optimizations / good practices are there so I'm asking here haha.
Thanks for such a quick response though!Loading the models up in the global scope will increase the cold startup time but it will speed up subsequent requests dramatically. This will come into play with active servers and FlashBoot. FlashBoot will come into play when a worker finishes with a request and it has another request waiting for it. So, the more workers and traffic to your endpoints the more efficient FlashBoot will become. Active servers run all the time so after the 1st query they will respond quickly, regardless of you traffic. It's a balancing act to figure out what is best number of active and max workers for your traffic
load outside of job function