RunPod•4mo ago

Best way to cache models with serverless ?

Hello, I'm using serverless endpoint to do image generation with flux dev. The model is 22gb which is quite long to download, especially since some workers seem to be faster than some others. I've been using a network volume as a cache which greatly improve start up time. However, doing this lock me in a particular region which I believe make some GPUs like the A100 very rarely available. Is there a way to have a global huggingface cache with serverless endpoint ? (like with pods) Thanks

5 Replies

flash-singh•4mo ago

for now its best to bake the model in container image, we have model cache planned end of jan to enable caching of models

morrowOP•4mo ago

Good to know ! So even with a 22 gb sized model it's worth including it in the container image ? I'll try that

Arkadiy 🇺🇦•4mo ago

Does docker during building recognizes secrets set in endpoint settings? In case where model included in the container image is private

Jason•4mo ago

i dont think so

flash-singh•4mo ago

secrets are not passed when building the container, they're only passed during runtime, we plan to add more custom options for builds

Gaming

Programming

Best way to cache models with serverless ?

Did you find this page helpful?