R
RunPod3mo ago
Blake

How to cache model download from HuggingFace - Tips?

Usin Serverless (48gb pro) w Flashboot. Want to optimize for fast cold start is there a guide somewhere? it does not seem to be caching the download - it's always re-downloading the model entirely (and slowly) should i ssh into some persistent storage & download the model there? then reference that local path in the HF model load?
No description
No description
9 Replies
nerdylive
nerdylive3mo ago
Flashboot isn't some free storage like ssd, use network storage, it's mounted in /runpod-volume in serverless.. Or in pods /workspace
Blake
BlakeOP3mo ago
@nerdylive would u recommend doing this (pic) ? (seems all workers in my endpoint will pull from this same /runpod-volume) - btw: perhaps runpod-volume is only available/mounted when when using a runpod docker base image? e.g. a ubuntu image doesn't seem to have it mounted (pic))
No description
No description
Blake
BlakeOP3mo ago
also: it seems like when you change GPU type, the /runpod-volume is deleted/non accessible - is this correct?
nerdylive
nerdylive3mo ago
No it's mounted when you run the worker in Runpod's server or system No, if you attach network storage it'll be persistent Along as you keep it and keep it attached to the endpoint you use
Blake
BlakeOP3mo ago
okay thanks. do you recommend creating a new network volume? & persisting HF weights in that? perhaps that's more stable/clear for me to follow than using the default /runpod-volume (which i assume is attached by default?) but seems to be giving me unexpected behaviour ------ i seem to be triggering new HF downloads even when this image has run & downloaded & persisted the weights to /runpod-volume/.cache/huggingface/hub/.. in previous runs Downloading shards: 0%| | 0/3 [00:00<?, ?it/s]Request 48c80db3-d744-4f39-8af2-929133a77895: HEAD https://huggingface.co/LanguageBind/Video-LLaVA-7B if u happen to know / have a code example that shows a reliable way to persist HF in he most straightforward way lmk!
No description
No description
nerdylive
nerdylive3mo ago
Just write and read to that path You can imagine it like a folder that is always there
Blake
BlakeOP3mo ago
when writing to /rundpod-volume i'm still seeing the container do full model downloads when i kill the worker so i: - created a new network storage ( /modelstorage ) & and am read/writing to this - attached this volume to my endpoint (didn't deploy the volume) but when i kill the worker it re-downloads from hf??
No description
Blake
BlakeOP3mo ago
any am i missing !? any code examples of ensuring it downloads from the network volume & NOT hf
nerdylive
nerdylive3mo ago
No it acts as a drive.. Not re-downloads from network volume You just use the model from /runpod-volume Maybe your path/method is wrong, you need to cache your model there somehow Snapshot model or set the path of model there

Did you find this page helpful?