RunPod•6mo ago

What methods can I use to reduce cold start times and decrease latency for serverless functions

I understand that adding active workers can reduce cold start issues, but it tends to be costly. I’m looking for a solution that strikes a balance between minimizing cold start times and managing costs. Since users only use our product during limited periods, keeping workers awake all the time isn’t necessary. I’d like to know about possible methods to achieve this balance.

18 Replies

Anthony (dev)•6mo ago

create a docker container that has everything needed to get the job done, and registers with the worker (using the runpod python module for instance). sometimes the container gets cached by the host and can save on cold start times. and if you have large files you need available, you can use the network storage

pg2571•6mo ago

@zeeb0t i had a similar question (https://discord.com/channels/912829806415085598/1307189362806624378/1307189362806624378) "docker container that has everything to get the job done" vs "large files you need available, ... network storage" so where do large models go? eg. SDXL checkpoints what do you mean by "registers with the workers"

Anthony (dev)•6mo ago

@pg2571 I just meant using the runpod handler function You can put the large model files in either the container directly or on the network storage to be read from in runtime. I prefer to put in container and have raised a similar topic: https://discordapp.com/channels/912829806415085598/1305822723917873152

pg2571•6mo ago

gotcha thanks, lemme try putting it in the container and see perf

Anthony (dev)•6mo ago

Your cold start will suck, but while the worker has a cache copy, it'll be pretty fine after that.

pg2571•6mo ago

how bad is cold start for a 20gb docker image? and is cold start faster on a network volume? because my traffic is very very spikey

Jason•6mo ago

Try to test it, I guess it differs from your specific model and how your application run it

Anthony (dev)•6mo ago

cold start when the files are in the network volume instead of in the docker image will be faster, UNLESS the files are single files that need to be read in full as part of the runtime process to service a request. it’ll then be a delaying factor - and one that you are paying gpu time for while things load. for instance if an ai model is being read from a network storage and is 20gb, and nothing works until that model is read into vram in every non-flashboot scenario, it’s then going to feel slow. however my experience has been that the docker container, once loaded, tends to be more reliably cached than flashboot can boot in a flash. plus the docker image is part of the initialising phase and i don’t think you are billed for that? so it may be better financially, too once they implement a permanent store for docker images to boot from, similar to network volumes, i think it’ll make a great solve for the cold start scenario and make us less reliant on runtime network volume and less concerned about flashboot less reliant on flashboot, i should say

Jason•6mo ago

you aren't billed for serverless image load yep

Anthony (dev)•6mo ago

i just hope permanent docker stores become a thing would love to pay for that kind of storage

yhlong00000•6mo ago

Cold start is faster when you have files baked in docker image compare to network volume. Loading file from local GPU server disk will be faster than loading from file server.

pg2571•6mo ago

maybe i don't fully understand how cold starts work. but isn't it faster to get a 30GB file from a local file server than downloading it from the internet (i.e. when downloading the dockerfile with everything baked in)?

Jason•6mo ago

Yes, but here they're comparing to network storage which has lower speed than loading form local pod disk

pg2571•6mo ago

ofc. but when a worker gets its first request, i assume it initializes from the dockerfile which it has to download right? or is the ops is such that the worker is already on a pod with the dockerfile downloaded, and its able to load it from the local pod when a request comes in for context mine is a comfyui workflow with ~30GB of models (but each individiual model is 1-6GB)

yhlong00000•6mo ago

When you first create the endpoint, the docker image will be downloaded to multiple servers, if you send request immediately, the worker isn’t ready. However after initial download finished, the image will keep on the server, unless you don’t use your endpoint for a very long time. The cold start is loading model into gpu vram, if you load model from local disk(bake model into docker image) it will faster than network volume.

pg2571•6mo ago

thanks that makes sense! whats a rough estimate of "use your endpoint for a very long time"? Is it in the range of hours or days?

Jason•6mo ago

More like days or even longer Try it out, I think it's variable

yhlong00000•6mo ago

I don't have exact number, but definitely days or even weeks.

Gaming

Programming

What methods can I use to reduce cold start times and decrease latency for serverless functions

Did you find this page helpful?