How to cache model download from HuggingFace - Tips?
Usin Serverless (48gb pro) w Flashboot. Want to optimize for fast cold start
is there a guide somewhere?
it does not seem to be caching the download - it's always re-downloading the model entirely (and slowly)
should i ssh into some persistent storage & download the model there? then reference that local path in the HF model load?
9 Replies
Flashboot isn't some free storage like ssd, use network storage, it's mounted in /runpod-volume in serverless.. Or in pods /workspace
@nerdylive would u recommend doing this (pic) ? (seems all workers in my endpoint will pull from this same /runpod-volume)
-
btw: perhaps runpod-volume is only available/mounted when when using a runpod docker base image? e.g. a ubuntu image doesn't seem to have it mounted (pic))
also: it seems like when you change GPU type, the /runpod-volume is deleted/non accessible - is this correct?
No it's mounted when you run the worker in Runpod's server or system
No, if you attach network storage it'll be persistent
Along as you keep it and keep it attached to the endpoint you use
okay thanks. do you recommend creating a new network volume? & persisting HF weights in that?
perhaps that's more stable/clear for me to follow than using the default /runpod-volume (which i assume is attached by default?) but seems to be giving me unexpected behaviour
------
i seem to be triggering new HF downloads even when this image has run & downloaded & persisted the weights to /runpod-volume/.cache/huggingface/hub/.. in previous runs
Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]Request 48c80db3-d744-4f39-8af2-929133a77895: HEAD https://huggingface.co/LanguageBind/Video-LLaVA-7B
if u happen to know / have a code example that shows a reliable way to persist HF in he most straightforward way lmk!Just write and read to that path
You can imagine it like a folder that is always there
when writing to /rundpod-volume i'm still seeing the container do full model downloads when i kill the worker
so i:
- created a new network storage ( /modelstorage ) & and am read/writing to this
- attached this volume to my endpoint (didn't deploy the volume)
but when i kill the worker it re-downloads from hf??
any am i missing !? any code examples of ensuring it downloads from the network volume & NOT hf
No it acts as a drive.. Not re-downloads from network volume
You just use the model from /runpod-volume
Maybe your path/method is wrong, you need to cache your model there somehow
Snapshot model or set the path of model there