RunPod•13mo ago

question about the data structure of a serverless endpoint

I have a question about the data structure of a serverless endpoint. I need to build a container with more than 1 model on it, it'll have a network volume with all data. The question is: Where should I store the virtual environments, package dependencies (pyenv and pipenv) and caches ? Storing them on the docker image or the network volume? which will bring better results in terms of performance and execution time ?

11 Replies

justin•13mo ago

docker image Network storage is essentially an external hard drive will always be slower pulling from a diff storage container than just pulling it directly from local resources

moonlightOP•13mo ago

Okay, and how much is the impact (approx) of starting a 10gb image in comparison of a 1gb image? (loading both the same packages during startup)

flash-singh•13mo ago

no difference, its the not image size but what your startup is doing, if your startup loads 10G model into vram vs 100G vram, that has a bigger impact image size has more to do with initialization of downloading the image

justin•13mo ago

Startup pulling a model into vram as flash said is the biggest impact. from a network storage will be significantly slower tho than locally on the image. And initialization time is one time for when workers are first created, and persist for future requests Do u have an idea of what ur trying to build?

moonlightOP•13mo ago

This happens just with the first request, right? After that a cold-start will not download again the image?

justin•13mo ago

Cold start and initialization is diff Initialization is when runpod downloads ur image to runpod and then it saves to the worker for future startups Startups have (cold start times) where the worker is going from nothing to something Cold start times can vary on different factors > and then finally before execution time there is also bit of setup time in ur execution such as model = load(model) if ur load(model) is huge will take it a bit but if u set ur worker to idle such as for 2 mins after the worker is active, it can sit with an already loaded model in memory and just keep pulling requests

moonlightOP•13mo ago

Thank you for the information The plan is to put SDXL + Vision Language Model and eventually some other smaller model, running them on a chain. I estimate about 36gb vram, I will need to start, do the work and inmediatly stop, so i'll need to load to vram on each request. The first model to run will be the sdxl so maybe it's possible to fit it in the docker image and the rest on the network volume, what do you think about?

flash-singh•13mo ago

are you using 48gb gpu? if all models are static, i would put them all in docker image unless it will be too big and go over like 50gb

moonlightOP•13mo ago

As static you mean they will not be replaced frequently, right? yes, they will be static, and yes i'm thinking on a 48gb gpu. At the moment i'm building the project and your help is very useful to setup everything on the proper way

flash-singh•13mo ago

i would put everything in docker image, that will also help with scale

moonlightOP•13mo ago

Thank you very much!

Gaming

Programming

question about the data structure of a serverless endpoint

Did you find this page helpful?