RunPod•4mo ago

How to cache model download from HuggingFace - Tips?

Usin Serverless (48gb pro) w Flashboot. Want to optimize for fast cold start is there a guide somewhere? it does not seem to be caching the download - it's always re-downloading the model entirely (and slowly) should i ssh into some persistent storage & download the model there? then reference that local path in the HF model load?

9 Replies

nerdylive•4mo ago

Flashboot isn't some free storage like ssd, use network storage, it's mounted in /runpod-volume in serverless.. Or in pods /workspace

BlakeOP•4mo ago

@nerdylive would u recommend doing this (pic) ? (seems all workers in my endpoint will pull from this same /runpod-volume) - btw: perhaps runpod-volume is only available/mounted when when using a runpod docker base image? e.g. a ubuntu image doesn't seem to have it mounted (pic))

BlakeOP•4mo ago

also: it seems like when you change GPU type, the /runpod-volume is deleted/non accessible - is this correct?

nerdylive•4mo ago

No it's mounted when you run the worker in Runpod's server or system No, if you attach network storage it'll be persistent Along as you keep it and keep it attached to the endpoint you use

BlakeOP•4mo ago

okay thanks. do you recommend creating a new network volume? & persisting HF weights in that? perhaps that's more stable/clear for me to follow than using the default /runpod-volume (which i assume is attached by default?) but seems to be giving me unexpected behaviour ------ i seem to be triggering new HF downloads even when this image has run & downloaded & persisted the weights to /runpod-volume/.cache/huggingface/hub/.. in previous runs

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]Request 48c80db3-d744-4f39-8af2-929133a77895: HEAD https://huggingface.co/LanguageBind/Video-LLaVA-7B

if u happen to know / have a code example that shows a reliable way to persist HF in he most straightforward way lmk!

nerdylive•4mo ago

Just write and read to that path You can imagine it like a folder that is always there

BlakeOP•4mo ago

when writing to /rundpod-volume i'm still seeing the container do full model downloads when i kill the worker so i: - created a new network storage ( /modelstorage ) & and am read/writing to this - attached this volume to my endpoint (didn't deploy the volume) but when i kill the worker it re-downloads from hf??

BlakeOP•4mo ago

any am i missing !? any code examples of ensuring it downloads from the network volume & NOT hf

nerdylive•4mo ago

No it acts as a drive.. Not re-downloads from network volume You just use the model from /runpod-volume Maybe your path/method is wrong, you need to cache your model there somehow Snapshot model or set the path of model there

Gaming

Programming

How to cache model download from HuggingFace - Tips?

Did you find this page helpful?