R
RunPod•2w ago
frogsbody

SGLang DeepSeek-V3-0324

I have been trying to run Deepseek-V3-0324 using instant clusters with 2 x (8 x H100s) and have so far been unsuccessful. I am trying to get the model to run multi-node + multi-gpu. I have downloaded the model from Huggingface onto a persistent and attach the persistent volume to my instant cluster before launching. After launching, I then run the Pytorch demo script as presented in https://docs.runpod.io/instant-clusters/pytorch to make sure that the network is working (it does). I then follow the instructions to get Deepseek-V3-0324 running according to: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3 Instead of following the absolute default instructions and doing:
# node 1
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 0 --trust-remote-code

# node 2
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 1 --trust-remote-code
# node 1
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 0 --trust-remote-code

# node 2
python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 10.0.0.1:5000 --nnodes 2 --node-rank 1 --trust-remote-code
In its place, I run the following command on each node:
python3 -m sglang.launch_server --model-path DeepSeek-V3-0324 --tp 16 --dist-init-addr ${MASTER_ADDR}:${MASTER_PORT} --nnodes ${NUM_NODES} --node-rank ${NODE_RANK} --trust-remote-code
python3 -m sglang.launch_server --model-path DeepSeek-V3-0324 --tp 16 --dist-init-addr ${MASTER_ADDR}:${MASTER_PORT} --nnodes ${NUM_NODES} --node-rank ${NODE_RANK} --trust-remote-code
The issue is that this hangs. I check nvidia-smi to see the model loading and it only ever loads each GPU up to almost 1GB before it goes up no further. Any help would be greatly appreciated.
Deploy with PyTorch | RunPod Documentation
Learn how to deploy an Instant Cluster and run a multi-node process using PyTorch.
GitHub
sglang/benchmark/deepseek_v3 at main · sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
No description
133 Replies
riverfog7
riverfog7•2w ago
Before discussing this problem Does a 600+ parameter fp16 model fit in 16xH100s? With reasonable context length? Or is it fp8? Idk Anyways why are you doing tensor parallel over network
Jason
Jason•2w ago
They're using the instant cluster so it should work Let me try and see how to run it on instant cluster, if it works I'll update here launch_server.py: error: argument --nnodes: invalid int value: '${NUM_NODES}' hmm i get this for eveery environment variable, including the world size one im putting the command in the CMD immediately without going into web terminal im using the lmsysorg/sglang:latest yeah can you try try the 4bit first, and see ifit works? or try more gpu vram
riverfog7
riverfog7•2w ago
Its tensor parallel tho Over network you should pipeline parallel Wow u r rich lol Apparently ots fp8 in the official repo So it should work
riverfog7
riverfog7•2w ago
Tensor Parallelism - NADDOD Blog
Tensor parallelism alleviates memory issues in large-scale training. RoCE enables efficient communication for GPU tensor parallelism, accelerating computations.
riverfog7
riverfog7•2w ago
Maybe becuz its cmd Try bash -c 'command'
Jason
Jason•2w ago
Yeah well not for an hour just testing
frogsbody
frogsbodyOP•2w ago
Yeah, I just want to run this thing. I'm happy to spend on GPUs for a period of time to get it running. But I can't even get the basics to work unfortunately... has anyone seen any example on any infrastructure setup of this working multi-node / pipeline parallelism? If not on RunPod than anywhere else? It seems that no one has got this running anywhere.
Jason
Jason•2w ago
i haven't tried running anything via network honestly, and im interested in this too 🙂
Jason
Jason•2w ago
is this a normal expectation for cluster's network speed? bgts5433fn5f2j d2d7wb5ale6zhl i feel like its abit too slow
No description
Jason
Jason•2w ago
Don't know whats wrong here, nccl seems to be communicating
Jason
Jason•2w ago
oh and also riverfog, i've tried your rcommendation, bash -c works! it reads the env correctly. also the other recommendation tp 8 when using total 8*2 gpus (16 total) will use only 8 gpus i guess, it doesnt load so i cannot know too, but when i try it some gpu vram usage are stuck at 0, some at 2% some 1% when using tp 16, all gpus are sttuck between 1% and 2%
Poddy
Poddy•2w ago
@frogsbody
Escalated To Zendesk
The thread has been escalated to Zendesk!
Jason
Jason•2w ago
maybe try opening this i think sglang doesnt support pp, only tp
frogsbody
frogsbodyOP•2w ago
Yes, I have exactly this issue.
frogsbody
frogsbodyOP•2w ago
They claim to support pipeline parallelism:
No description
frogsbody
frogsbodyOP•2w ago
GitHub
sglang/benchmark/deepseek_v3 at main · sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
Jason
Jason•2w ago
I dont see the pipeline pararellism here isnt it supposed to be a configuration arguments?
frogsbody
frogsbodyOP•2w ago
nnodes = 2 It’s a torchrun argument that gets passed through to the equivalent in SGLang I believe.
Jason
Jason•2w ago
i see yeah then it might use both did you open a ticket? i will try vllm in the future maybe its better, id recommend you to try it too
frogsbody
frogsbodyOP•2w ago
@Jason have you tried vLLM with instant clusters? I believe the communication mechanism under the hood doesn't work with the way that Runpod sets up inter-node communication. I couldn't get it to work (this was a few weeks ago when it was still in beta though). I wasn't sure where to open a ticket because I'm not sure where the error is really coming from... I think it's an SGLang issue but I wasn't clear.
riverfog7
riverfog7•2w ago
are you still here? ive never used sglang but vllm pipeline parallel works well with multi node even with not. that good network bandwidth
frogsbody
frogsbodyOP•2w ago
I'm still trying @riverfog7. Have you tested vLLM with Instant Cluster or do you have another solution where I can test multi-gpu in the cloud to run this?
riverfog7
riverfog7•2w ago
@frogsbody but do you really need multigpu?
frogsbody
frogsbodyOP•2w ago
Yeah, I specifically need to test tensor parallelism and pipeline parallelism: 2 nodes of 8 x H100
riverfog7
riverfog7•2w ago
i mean you can host the same model in 1 node you want deepseek v3 at fp8 right?
frogsbody
frogsbodyOP•2w ago
My requirements are to run Deepseek-V3-0324 over two nodes by whatever means - I just have to see that pipeline and tensor parallelism can work for the model
riverfog7
riverfog7•2w ago
okay is sglang required too?
frogsbody
frogsbodyOP•2w ago
It's less about actually using it - more about showing it can work No, it can be anything vLLM would be fine too
riverfog7
riverfog7•2w ago
vllm should work ive done it in the past
frogsbody
frogsbodyOP•2w ago
Have you got that working in Runpod Instant Clusters?
riverfog7
riverfog7•2w ago
not with 2x8 but that doesnt matter no in AWS should work. with runpod tho
frogsbody
frogsbodyOP•2w ago
I had trouble running the basic torchrun script: https://docs.runpod.io/instant-clusters/pytorch
Deploy with PyTorch | RunPod Documentation
Learn how to deploy an Instant Cluster and run a multi-node process using PyTorch.
frogsbody
frogsbodyOP•2w ago
It failed to run it when I tried with VLLM Yeah, I tried AWS but they wouldn't give me any GPUs so now just trying with runpod I will try vllm again
riverfog7
riverfog7•2w ago
can you ping the other pod
frogsbody
frogsbodyOP•2w ago
Yeah I can ping it
riverfog7
riverfog7•2w ago
with ip
frogsbody
frogsbodyOP•2w ago
It's something to do with Ray, which vLLM uses under the hood
riverfog7
riverfog7•2w ago
do u use vllm docker image or sth else?
frogsbody
frogsbodyOP•2w ago
I was using something else - but I can use that docker image
riverfog7
riverfog7•2w ago
i suceededd with the docker image soo
frogsbody
frogsbodyOP•2w ago
Thanks for letting me know, I'll try that out and let you know how it goes
riverfog7
riverfog7•2w ago
riverfog7
riverfog7•2w ago
riverfog7
riverfog7•2w ago
the cluster making script docker run \ --entrypoint /bin/bash \ --network host \ --name node \ --shm-size 10.24g \ --gpus all \ -v "${PATH_TO_HF_HOME}:/root/.cache/huggingface" \ "${ADDITIONAL_ARGS[@]}" \ "${DOCKER_IMAGE}" -c "${RAY_START_CMD}" this is the docker run command so you can modify this and run it
frogsbody
frogsbodyOP•2w ago
Where do I run this? When I create the pod?
riverfog7
riverfog7•2w ago
no so waht the docs says is you have two physical machines then you create a ray container on both physical machines and make a cluster. but in your case you have no access to physical machines
frogsbody
frogsbodyOP•2w ago
Yeah, the issue I had is that runpod doesn't have that I have to work within those bounds I don't have AWS or anything to work with
riverfog7
riverfog7•2w ago
so you should translate the docker run command to runpod's template docker run --entrypoint /bin/bash --network host --name node --shm-size 10.24g --gpus all -v /path/to/the/huggingface/home/in/this/node:/root/.cache/huggingface -e VLLM_HOST_IP=ip_of_this_node vllm/vllm-openai -c ray start --block --address=ip_of_head_node:6379
frogsbody
frogsbodyOP•2w ago
Yeah, this was my next idea - I just have to figure out how to modify runpod to work with this since "docker" can't be run inside of a pod once I start it It has to be part of a template or something. I'm pretty new to this side of Runpod.
riverfog7
riverfog7•2w ago
should be image name: vllm/vllm-openai CMD: python3 -m vllm.entrypoints.openai.api_server -c ray start --block --address=ip_of_head_node:6379 mount nw volume to ~/.cache/ for the worker image name: vllm/vllm-openai CMD: python3 -m vllm.entrypoints.openai.api_server -c ray start --block --head --port=6379 env: VLLM_HOST_IP=ip_of_this_node for the head
frogsbody
frogsbodyOP•2w ago
Are you starting these as two separate pods or using Instant Cluster? In this case it looks like you're using two separate pods with global networking or something
riverfog7
riverfog7•2w ago
can you apply diff images
frogsbody
frogsbodyOP•2w ago
Not with instant cluster
riverfog7
riverfog7•2w ago
for the two pods in clusters? or diff setting at least
frogsbody
frogsbodyOP•2w ago
No description
frogsbody
frogsbodyOP•2w ago
Doesn't look likeit
riverfog7
riverfog7•2w ago
lol should we write a script? its solvable
frogsbody
frogsbodyOP•2w ago
Yeah I'd love to lol, been trying to run DeepSeek V3 across two nodes for a while now vllm serve /path/to/the/model/in/the/container \ --tensor-parallel-size 8 \ --pipeline-parallel-size 2 I was thinking this should just work If I spin up a cluster And go into each node and run this And make sure to pass in the right information for the host... But then it fails because of Ray
riverfog7
riverfog7•2w ago
yeah but we need a script to build the ray cluster first it should run INSIDE a ray cluster
frogsbody
frogsbodyOP•2w ago
Yeah, I feel like that's outside the default scope of instant clusters. Are you suggesting we set up a ray cluster inside of our non-ray cluster?
riverfog7
riverfog7•2w ago
that's what i meant 😄 the writing a script part was for that
frogsbody
frogsbodyOP•2w ago
That would be cool... solve a lot of problems lol
riverfog7
riverfog7•2w ago
ill try with global networking first just to see if it works
frogsbody
frogsbodyOP•2w ago
I'll try form a ray cluster in Instant Cluster again
riverfog7
riverfog7•2w ago
try python3 -m vllm.entrypoints.openai.api_server -c ray start --block --address=ip_of_head_node:6379 this inside a vllm container
frogsbody
frogsbodyOP•2w ago
Sure will try now
riverfog7
riverfog7•2w ago
so you have seperate ssh access to the two nodes right?
frogsbody
frogsbodyOP•2w ago
Yes
riverfog7
riverfog7•2w ago
good
frogsbody
frogsbodyOP•2w ago
Will send sc in a sec api_server.py: error: argument --block-size: expected one argument
riverfog7
riverfog7•2w ago
isnt it --block?
frogsbody
frogsbodyOP•2w ago
I copied what you sent and that's what it gave me In my case:
python3 -m vllm.entrypoints.openai.api_server -c ray start --block --address=10.65.0.2:6379
python3 -m vllm.entrypoints.openai.api_server -c ray start --block --address=10.65.0.2:6379
riverfog7
riverfog7•2w ago
python3 -m vllm.entrypoints.openai.api_server -c ray start --block --address=ip_of_head_node:6379 this?
frogsbody
frogsbodyOP•2w ago
Yeah, that's what I did ^
riverfog7
riverfog7•2w ago
hnm vllm/vllm-openai the image is this
frogsbody
frogsbodyOP•2w ago
Yep
riverfog7
riverfog7•2w ago
i think il ltest in mine first wait a sec oh it was just ray start --block --address=10.65.0.2:6379 or bash -c "ray start ...."
frogsbody
frogsbodyOP•2w ago
Yeah, I'm doing that right now actually
riverfog7
riverfog7•2w ago
didnt see the --entrypoint /bin/bash
frogsbody
frogsbodyOP•2w ago
But can't get the worker to connect
riverfog7
riverfog7•2w ago
xD
Jason
Jason•2w ago
no ihavent
frogsbody
frogsbodyOP•2w ago
No description
riverfog7
riverfog7•2w ago
wdym?
Jason
Jason•2w ago
any errors or logs?
frogsbody
frogsbodyOP•2w ago
Tried 6379 and couldn't get that to work
Jason
Jason•2w ago
thats the port from env?
frogsbody
frogsbodyOP•2w ago
So tried a port I knew was exposed 29400 since I can ping between the nodes with that
riverfog7
riverfog7•2w ago
ray start --block --address=10.65.0.2:6379 this?
frogsbody
frogsbodyOP•2w ago
This works
riverfog7
riverfog7•2w ago
is adding --block make a diff?
frogsbody
frogsbodyOP•2w ago
But connecting from worker doesn't Let me try with block
riverfog7
riverfog7•2w ago
ray start --block --head --port=6379 for the head ray start --block --address=10.65.0.2:6379 for the worker
frogsbody
frogsbodyOP•2w ago
No description
riverfog7
riverfog7•2w ago
maybe it doesnt work bc u already started a vllm process in the start command
frogsbody
frogsbodyOP•2w ago
I actually haven't started anything in this one I am not using the vLLM image this time. I started a new cluster that doesn't have vLLM. I pip installed it. Regardless, Ray should work independently
riverfog7
riverfog7•2w ago
yeah same thought but had nothing to blame other than that check ufw just in case
frogsbody
frogsbodyOP•2w ago
What is UFW?
riverfog7
riverfog7•2w ago
and other firewalls too ubuntu firewall
frogsbody
frogsbodyOP•2w ago
Ah okay let me check
riverfog7
riverfog7•2w ago
that was the problem in my last attempt @frogsbody i have one question
frogsbody
frogsbodyOP•2w ago
Trying to check but have to install packages @riverfog7 yeah what's up?
riverfog7
riverfog7•2w ago
why does the first pic say 172.xx but second pic says10.60.sth
frogsbody
frogsbodyOP•2w ago
That's the "master address"
riverfog7
riverfog7•2w ago
master addr?
frogsbody
frogsbodyOP•2w ago
No description
frogsbody
frogsbodyOP•2w ago
Overview | RunPod Documentation
Instant Clusters enable high-performance computing across multiple GPUs with high-speed networking capabilities.
frogsbody
frogsbodyOP•2w ago
NODE_ADDR is the address of the individual node That's the one that ray uses vLLM uses Ray under the hood and it isn't playing nicely That's why I was hoping SGLang would work since it uses pytorch But then we have that weird bug where it hangs model loading at 1% lol I suspect that it's actually only loading the pytorch stuff And never actually loads any of the weights in
riverfog7
riverfog7•2w ago
maybe it binds to the wrong nic?
frogsbody
frogsbodyOP•2w ago
We use eth1 I think here
riverfog7
riverfog7•2w ago
and recieves from the public ip but not from private ip
frogsbody
frogsbodyOP•2w ago
Issue is that I'm not sure if that's something we can even fix under the hood with vLLM... I just don't know enough about how vLLM works
riverfog7
riverfog7•2w ago
if ray works vllm works
frogsbody
frogsbodyOP•2w ago
And vLLM uses that same ray cluster?
riverfog7
riverfog7•2w ago
yeah
frogsbody
frogsbodyOP•2w ago
Hmm
riverfog7
riverfog7•2w ago
you can just use the ray cluster as one computer vllm does the finicky things by itself
frogsbody
frogsbodyOP•2w ago
I actually can't even ping between each machine now
riverfog7
riverfog7•2w ago
maybe ray start --block --head --port 6379 --node-ip-address 10.65.0.2 in the head? wut
frogsbody
frogsbodyOP•2w ago
No description
riverfog7
riverfog7•2w ago
can u just ping it ping 10.sth
frogsbody
frogsbodyOP•2w ago
No description
riverfog7
riverfog7•2w ago
ufw status?
frogsbody
frogsbodyOP•2w ago
Interesting My environment is messed up now I can't run the default torch script here
frogsbody
frogsbodyOP•2w ago
Deploy with PyTorch | RunPod Documentation
Learn how to deploy an Instant Cluster and run a multi-node process using PyTorch.
riverfog7
riverfog7•2w ago
lol
frogsbody
frogsbodyOP•2w ago
So I messed something up with whatever we tried
riverfog7
riverfog7•2w ago
hmm maybe start with a fresh pytorch image
frogsbody
frogsbodyOP•2w ago
I'
riverfog7
riverfog7•2w ago
and install everything (ray and vllm)
frogsbody
frogsbodyOP•2w ago
I'm going to have to refund this account lol, it won't let me start another pod Not enough money in the account lol I may sleep for a bit and get back to this, interseting problem to solve
riverfog7
riverfog7•2w ago
great the community says --node-ip-address providing this should make it bind to the proper address so maybe try that next time in the head node
frogsbody
frogsbodyOP•2w ago
Yeah will do, I'll post any findings here

Did you find this page helpful?