Serverless deepseek-ai/DeepSeek-R1 setup?
How can I configure a serverless end point for deepseek-ai/DeepSeek-R1?
29 Replies
does vllm supports that model?
if not, you can make a model that can run inference for that model
Basic config, 2 GPU count
data:image/s3,"s3://crabby-images/e0206/e020608b7b36f30c23a11b7a96431892f356150c" alt="No description"
data:image/s3,"s3://crabby-images/4ef52/4ef5241533e05b7e824a513bac6b60e10659137c" alt="No description"
Once it is running, I try the default hello world request and it just gets stuck IN_QUEUE for 8 minutes..
Can you check logs maybe its still downloading
or OOM
wait.. how big is the model?
seems like r1 is a really huge model isnt it?
yes, but I tried even just following along with the youtube tutorial here and got the same IN_QUEUE problem...: https://youtu.be/0XXKK82LwWk?si=ZDCu_YV39Eb5Fn8A
RunPod
YouTube
Set Up A Serverless LLM Endpoint Using vLLM In Six Minutes on RunPod
Guide to setting up a serverless endpoint on RunPod in six minutes on RunPod.
Any logs?
in your workers or endpoint?
Oh, wait!! I just ran the 1.5B model and got this response:
data:image/s3,"s3://crabby-images/39749/397499995b466221a296342ce6198f0c19b6bce7" alt="No description"
When I tried running the larger model, I got errors about not enough memory
""Uncaught exception | <class 'torch.OutOfMemoryError'>; CUDA out of memory. Tried to allocate 3.50 GiB. GPU 0 has a total capacity of 44.45 GiB of which 1.42 GiB is free"
seems like you got oom ya..
So how do I configure ?
r1 is such a huge model seems like you need 1tb+ vram
don't know how to calculate, but est maybe something in range of 700gb+ vram
wow
so it's not really an option to deploy?..
not sure, depends for your use hahah
I mean, Deepseek offers their own API keys
I thought it could be more cost effective to just run a serverless endpoint here but..
only if you got enough volume, especially for bigger models imo
hmm.. I see
Thanks for your help
your welcome bro
Hey @nerdylive i still can deploy the 7B deepseek R1 model right instead of huge model. ?
data:image/s3,"s3://crabby-images/60c0b/60c0b13aae343e3b3e1511e7dea3f1f73882fb4f" alt="No description"
I am facing this issue
I am not that good in resolving issues.
Did you find a solution ?
Not yet...
use trust remote code = true
data:image/s3,"s3://crabby-images/09b5b/09b5b59537e1d6df1b2030a20ede5fb4203b57de" alt="No description"
where should i put this
in envrinment
env variable
like this
data:image/s3,"s3://crabby-images/56294/5629442dd62a74ee775c7fecb940798327a8b6de" alt="No description"
Is the model you are trying to run a GGUF quant? You'll need a custom script for GGUF quants or if there is multiple models in a single repo