What methods can I use to reduce cold start times and decrease latency for serverless functions
I understand that adding active workers can reduce cold start issues, but it tends to be costly. I’m looking for a solution that strikes a balance between minimizing cold start times and managing costs. Since users only use our product during limited periods, keeping workers awake all the time isn’t necessary. I’d like to know about possible methods to achieve this balance.
18 Replies
create a docker container that has everything needed to get the job done, and registers with the worker (using the runpod python module for instance). sometimes the container gets cached by the host and can save on cold start times. and if you have large files you need available, you can use the network storage
@zeeb0t i had a similar question (https://discord.com/channels/912829806415085598/1307189362806624378/1307189362806624378)
"docker container that has everything to get the job done" vs "large files you need available, ... network storage"
so where do large models go? eg. SDXL checkpoints
what do you mean by "registers with the workers"
@pg2571 I just meant using the runpod handler function
You can put the large model files in either the container directly or on the network storage to be read from in runtime. I prefer to put in container and have raised a similar topic: https://discordapp.com/channels/912829806415085598/1305822723917873152
gotcha thanks, lemme try putting it in the container and see perf
Your cold start will suck, but while the worker has a cache copy, it'll be pretty fine after that.
how bad is cold start for a 20gb docker image?
and is cold start faster on a network volume?
because my traffic is very very spikey
Try to test it, I guess it differs from your specific model and how your application run it
cold start when the files are in the network volume instead of in the docker image will be faster, UNLESS the files are single files that need to be read in full as part of the runtime process to service a request. it’ll then be a delaying factor - and one that you are paying gpu time for while things load. for instance if an ai model is being read from a network storage and is 20gb, and nothing works until that model is read into vram in every non-flashboot scenario, it’s then going to feel slow. however my experience has been that the docker container, once loaded, tends to be more reliably cached than flashboot can boot in a flash. plus the docker image is part of the initialising phase and i don’t think you are billed for that? so it may be better financially, too
once they implement a permanent store for docker images to boot from, similar to network volumes, i think it’ll make a great solve for the cold start scenario and make us less reliant on runtime network volume and less concerned about flashboot
less reliant on flashboot, i should say
you aren't billed for serverless image load yep
i just hope permanent docker stores become a thing
would love to pay for that kind of storage
Cold start is faster when you have files baked in docker image compare to network volume. Loading file from local GPU server disk will be faster than loading from file server.
maybe i don't fully understand how cold starts work. but isn't it faster to get a 30GB file from a local file server than downloading it from the internet (i.e. when downloading the dockerfile with everything baked in)?
Yes, but here they're comparing to network storage which has lower speed than loading form local pod disk
ofc. but when a worker gets its first request, i assume it initializes from the dockerfile which it has to download right?
or is the ops is such that the worker is already on a pod with the dockerfile downloaded, and its able to load it from the local pod when a request comes in
for context mine is a comfyui workflow with ~30GB of models (but each individiual model is 1-6GB)
When you first create the endpoint, the docker image will be downloaded to multiple servers, if you send request immediately, the worker isn’t ready. However after initial download finished, the image will keep on the server, unless you don’t use your endpoint for a very long time. The cold start is loading model into gpu vram, if you load model from local disk(bake model into docker image) it will faster than network volume.
thanks that makes sense! whats a rough estimate of "use your endpoint for a very long time"? Is it in the range of hours or days?
More like days or even longer
Try it out, I think it's variable
I don't have exact number, but definitely days or even weeks.