Serverless deepseek-ai/DeepSeek-R1 setup?
How can I configure a serverless end point for deepseek-ai/DeepSeek-R1?
18 Replies
does vllm supports that model?
if not, you can make a model that can run inference for that model
Basic config, 2 GPU count
Once it is running, I try the default hello world request and it just gets stuck IN_QUEUE for 8 minutes..
Can you check logs maybe its still downloading
or OOM
wait.. how big is the model?
seems like r1 is a really huge model isnt it?
yes, but I tried even just following along with the youtube tutorial here and got the same IN_QUEUE problem...: https://youtu.be/0XXKK82LwWk?si=ZDCu_YV39Eb5Fn8A
RunPod
YouTube
Set Up A Serverless LLM Endpoint Using vLLM In Six Minutes on RunPod
Guide to setting up a serverless endpoint on RunPod in six minutes on RunPod.
Any logs?
in your workers or endpoint?
Oh, wait!! I just ran the 1.5B model and got this response:
When I tried running the larger model, I got errors about not enough memory
""Uncaught exception | <class 'torch.OutOfMemoryError'>; CUDA out of memory. Tried to allocate 3.50 GiB. GPU 0 has a total capacity of 44.45 GiB of which 1.42 GiB is free"
seems like you got oom ya..
So how do I configure ?
r1 is such a huge model seems like you need 1tb+ vram
don't know how to calculate, but est maybe something in range of 700gb+ vram
wow
so it's not really an option to deploy?..
not sure, depends for your use hahah
I mean, Deepseek offers their own API keys
I thought it could be more cost effective to just run a serverless endpoint here but..
only if you got enough volume, especially for bigger models imo
hmm.. I see
Thanks for your help
your welcome bro