Best way to cache models with serverless ?
Hello,
I'm using serverless endpoint to do image generation with flux dev. The model is 22gb which is quite long to download, especially since some workers seem to be faster than some others.
I've been using a network volume as a cache which greatly improve start up time. However, doing this lock me in a particular region which I believe make some GPUs like the A100 very rarely available.
Is there a way to have a global huggingface cache with serverless endpoint ? (like with pods)
Thanks
5 Replies
for now its best to bake the model in container image, we have model cache planned end of jan to enable caching of models
Good to know ! So even with a 22 gb sized model it's worth including it in the container image ? I'll try that
Does docker during building recognizes secrets set in endpoint settings?
In case where model included in the container image is private
i dont think so
secrets are not passed when building the container, they're only passed during runtime, we plan to add more custom options for builds