R
RunPod2mo ago
Lattus

Serverless deepseek-ai/DeepSeek-R1 setup?

How can I configure a serverless end point for deepseek-ai/DeepSeek-R1?
29 Replies
nerdylive
nerdylive2mo ago
does vllm supports that model? if not, you can make a model that can run inference for that model
Lattus
LattusOP2mo ago
Basic config, 2 GPU count
No description
No description
Lattus
LattusOP2mo ago
Once it is running, I try the default hello world request and it just gets stuck IN_QUEUE for 8 minutes..
nerdylive
nerdylive2mo ago
Can you check logs maybe its still downloading or OOM wait.. how big is the model? seems like r1 is a really huge model isnt it?
Lattus
LattusOP2mo ago
yes, but I tried even just following along with the youtube tutorial here and got the same IN_QUEUE problem...: https://youtu.be/0XXKK82LwWk?si=ZDCu_YV39Eb5Fn8A
RunPod
YouTube
Set Up A Serverless LLM Endpoint Using vLLM In Six Minutes on RunPod
Guide to setting up a serverless endpoint on RunPod in six minutes on RunPod.
nerdylive
nerdylive2mo ago
Any logs? in your workers or endpoint?
Lattus
LattusOP2mo ago
Oh, wait!! I just ran the 1.5B model and got this response:
No description
Lattus
LattusOP2mo ago
When I tried running the larger model, I got errors about not enough memory ""Uncaught exception | <class 'torch.OutOfMemoryError'>; CUDA out of memory. Tried to allocate 3.50 GiB. GPU 0 has a total capacity of 44.45 GiB of which 1.42 GiB is free"
nerdylive
nerdylive2mo ago
seems like you got oom ya..
Lattus
LattusOP2mo ago
So how do I configure ?
nerdylive
nerdylive2mo ago
r1 is such a huge model seems like you need 1tb+ vram don't know how to calculate, but est maybe something in range of 700gb+ vram
Lattus
LattusOP2mo ago
wow so it's not really an option to deploy?..
nerdylive
nerdylive2mo ago
not sure, depends for your use hahah
Lattus
LattusOP2mo ago
I mean, Deepseek offers their own API keys I thought it could be more cost effective to just run a serverless endpoint here but..
nerdylive
nerdylive2mo ago
only if you got enough volume, especially for bigger models imo
Lattus
LattusOP2mo ago
hmm.. I see Thanks for your help
nerdylive
nerdylive2mo ago
your welcome bro
lsdvaibhavvvv
lsdvaibhavvvv2mo ago
Hey @nerdylive i still can deploy the 7B deepseek R1 model right instead of huge model. ?
lsdvaibhavvvv
lsdvaibhavvvv2mo ago
No description
lsdvaibhavvvv
lsdvaibhavvvv2mo ago
I am facing this issue I am not that good in resolving issues.
<MarDev/>
<MarDev/>2mo ago
Did you find a solution ?
lsdvaibhavvvv
lsdvaibhavvvv2mo ago
Not yet...
nerdylive
nerdylive2mo ago
use trust remote code = true
nerdylive
nerdylive2mo ago
No description
lsdvaibhavvvv
lsdvaibhavvvv2mo ago
where should i put this in envrinment
nerdylive
nerdylive2mo ago
env variable
nerdylive
nerdylive2mo ago
like this
No description
riverfog7
riverfog72w ago
Is the model you are trying to run a GGUF quant? You'll need a custom script for GGUF quants or if there is multiple models in a single repo

Did you find this page helpful?