question about the data structure of a serverless endpoint
I have a question about the data structure of a serverless endpoint.
I need to build a container with more than 1 model on it, it'll have a network volume with all data.
The question is: Where should I store the virtual environments, package dependencies (pyenv and pipenv) and caches ?
Storing them on the docker image or the network volume? which will bring better results in terms of performance and execution time ?
11 Replies
docker image
Network storage is essentially an external hard drive
will always be slower pulling from a diff storage container
than just pulling it directly from local resources
Okay, and how much is the impact (approx) of starting a 10gb image in comparison of a 1gb image? (loading both the same packages during startup)
no difference, its the not image size but what your startup is doing, if your startup loads 10G model into vram vs 100G vram, that has a bigger impact
image size has more to do with initialization of downloading the image
Startup pulling a model into vram as flash said is the biggest impact. from a network storage will be significantly slower tho than locally on the image.
And initialization time is one time for when workers are first created, and persist for future requests
Do u have an idea of what ur trying to build?
This happens just with the first request, right? After that a cold-start will not download again the image?
Cold start and initialization is diff
Initialization is when runpod downloads ur image to runpod and then it saves to the worker for future startups
Startups have (cold start times) where the worker is going from nothing to something
Cold start times can vary on different factors > and then finally
before execution time there is also bit of setup time in ur execution such as
model = load(model)
if ur load(model) is huge will take it a bit
but if u set ur worker to idle such as for 2 mins
after the worker is active, it can sit with an already loaded model in memory and just keep pulling requests
Thank you for the information
The plan is to put SDXL + Vision Language Model and eventually some other smaller model, running them on a chain. I estimate about 36gb vram, I will need to start, do the work and inmediatly stop, so i'll need to load to vram on each request. The first model to run will be the sdxl so maybe it's possible to fit it in the docker image and the rest on the network volume, what do you think about?
are you using 48gb gpu? if all models are static, i would put them all in docker image unless it will be too big and go over like 50gb
As static you mean they will not be replaced frequently, right? yes, they will be static, and yes i'm thinking on a 48gb gpu. At the moment i'm building the project and your help is very useful to setup everything on the proper way
i would put everything in docker image, that will also help with scale
Thank you very much!