RunPod•15mo ago

Deepseek coder on serverless

Hello, new serverless user here, i would be using the vllm worker, so whenever it gets spun up from a coldstart, does it have to download the model everytime? Id be running it in fp16 which means it be about 14gb of data to download

6 Replies

J.•15mo ago

If ur script says so then yes So u can either bake into ur docker image or use network storage to persist ur model between runs network storage had some impact to speed due to being on an external drive essentially but can still be decent What you can do to make it easy on yourself is that if you have a Docker file, write a simple bash script to trigger a tiny python script to do a VLLM job like "hello world" and it will "automatically" go and download the models and stuff, into the docker file during build itme 🙂 or again, network volume

SuperintendentOP•15mo ago

oh really neat thanks

J.•15mo ago

**i could be wrong on this for VLLM actually lol. I wonder if VLLM will crash bc of no GPU, ive remember it has done that There might be other ways to do it, like prob could just download the model urself where VLLM expects it but idk how VLLM downloads / prepares models using an HF download or curl whatever.

Toxibunny•15mo ago

I’ve been following the instructions for ‘option 2’ on this page: https://github.com/runpod-workers/worker-vllm

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Toxibunny•15mo ago

It’s like ‘open a folder, github bash clone the repo, open the command line, put in that one line

sudo docker build -t username/image:tag --build-arg MODEL_NAME="TheBloke/deepseek-coder-33B-instruct-AWQ" --build-arg BASE_PATH="/models" .

sudo docker build -t username/image:tag --build-arg MODEL_NAME="TheBloke/deepseek-coder-33B-instruct-AWQ" --build-arg BASE_PATH="/models" .

Windows doesn’t need sudo. Model name is copied using the huggingface button. Username/image:tag needs to be your username and chosen image name/tag (I’m sure you know this already) and to be in all lower-case, and runpod requires a tag (I’ve just been mostly using 0.1 so far) It’s been working. edit: I put the name in for deepseek coder awq quantized. I have not tried this one personally. Note that GGUF quants won’t work with vLLM afaik.

Alpay Ariyak•15mo ago

If you attach a network volume to the endpoint, then the model will only be downloaded once, as long as you’re using our vLLM worker.

Gaming

Programming

Deepseek coder on serverless

Did you find this page helpful?