R
RunPod2mo ago
yasek

What methods can I use to reduce cold start times and decrease latency for serverless functions

I understand that adding active workers can reduce cold start issues, but it tends to be costly. I’m looking for a solution that strikes a balance between minimizing cold start times and managing costs. Since users only use our product during limited periods, keeping workers awake all the time isn’t necessary. I’d like to know about possible methods to achieve this balance.
18 Replies
zeeb0t
zeeb0t2mo ago
create a docker container that has everything needed to get the job done, and registers with the worker (using the runpod python module for instance). sometimes the container gets cached by the host and can save on cold start times. and if you have large files you need available, you can use the network storage
pg2571
pg25712mo ago
@zeeb0t i had a similar question (https://discord.com/channels/912829806415085598/1307189362806624378/1307189362806624378) "docker container that has everything to get the job done" vs "large files you need available, ... network storage" so where do large models go? eg. SDXL checkpoints what do you mean by "registers with the workers"
zeeb0t
zeeb0t2mo ago
@pg2571 I just meant using the runpod handler function You can put the large model files in either the container directly or on the network storage to be read from in runtime. I prefer to put in container and have raised a similar topic: https://discordapp.com/channels/912829806415085598/1305822723917873152
pg2571
pg25712mo ago
gotcha thanks, lemme try putting it in the container and see perf
zeeb0t
zeeb0t2mo ago
Your cold start will suck, but while the worker has a cache copy, it'll be pretty fine after that.
pg2571
pg25712mo ago
how bad is cold start for a 20gb docker image? and is cold start faster on a network volume? because my traffic is very very spikey
nerdylive
nerdylive2mo ago
Try to test it, I guess it differs from your specific model and how your application run it
zeeb0t
zeeb0t2mo ago
cold start when the files are in the network volume instead of in the docker image will be faster, UNLESS the files are single files that need to be read in full as part of the runtime process to service a request. it’ll then be a delaying factor - and one that you are paying gpu time for while things load. for instance if an ai model is being read from a network storage and is 20gb, and nothing works until that model is read into vram in every non-flashboot scenario, it’s then going to feel slow. however my experience has been that the docker container, once loaded, tends to be more reliably cached than flashboot can boot in a flash. plus the docker image is part of the initialising phase and i don’t think you are billed for that? so it may be better financially, too once they implement a permanent store for docker images to boot from, similar to network volumes, i think it’ll make a great solve for the cold start scenario and make us less reliant on runtime network volume and less concerned about flashboot less reliant on flashboot, i should say
nerdylive
nerdylive2mo ago
you aren't billed for serverless image load yep
zeeb0t
zeeb0t2mo ago
i just hope permanent docker stores become a thing would love to pay for that kind of storage
yhlong00000
yhlong000002mo ago
Cold start is faster when you have files baked in docker image compare to network volume. Loading file from local GPU server disk will be faster than loading from file server.
pg2571
pg25712mo ago
maybe i don't fully understand how cold starts work. but isn't it faster to get a 30GB file from a local file server than downloading it from the internet (i.e. when downloading the dockerfile with everything baked in)?
nerdylive
nerdylive2mo ago
Yes, but here they're comparing to network storage which has lower speed than loading form local pod disk
pg2571
pg25712mo ago
ofc. but when a worker gets its first request, i assume it initializes from the dockerfile which it has to download right? or is the ops is such that the worker is already on a pod with the dockerfile downloaded, and its able to load it from the local pod when a request comes in for context mine is a comfyui workflow with ~30GB of models (but each individiual model is 1-6GB)
yhlong00000
yhlong000002mo ago
When you first create the endpoint, the docker image will be downloaded to multiple servers, if you send request immediately, the worker isn’t ready. However after initial download finished, the image will keep on the server, unless you don’t use your endpoint for a very long time. The cold start is loading model into gpu vram, if you load model from local disk(bake model into docker image) it will faster than network volume.
pg2571
pg25712mo ago
thanks that makes sense! whats a rough estimate of "use your endpoint for a very long time"? Is it in the range of hours or days?
nerdylive
nerdylive2mo ago
More like days or even longer Try it out, I think it's variable
yhlong00000
yhlong000002mo ago
I don't have exact number, but definitely days or even weeks.
Want results from more Discord servers?
Add your server