R
RunPod4mo ago
Thibaud

vllm seems not use GPU

i'm using vllm and on the graph, when i launch some request, only cpu usage increase. if i open a terminal and launch nvidia-smi, i didn't see any process too. settings line --model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97
No description
38 Replies
nerdylive
nerdylive4mo ago
Try another pod? select the cuda version to the right one
Thibaud
ThibaudOP4mo ago
i tried on 4 different pod. for cuda version, i don't know where i can set it
nerdylive
nerdylive4mo ago
btw any logs? not set, more like filter when you create a pod
Thibaud
ThibaudOP4mo ago
i m trying pod not serverless. i don't see where in pod i can filter cuda
nerdylive
nerdylive4mo ago
No description
nerdylive
nerdylive4mo ago
ez just try 12.5
Thibaud
ThibaudOP4mo ago
thanks!
nerdylive
nerdylive4mo ago
try 12.4 if not just checked
Thibaud
ThibaudOP4mo ago
i used A40 so 12.4 i ll try with RTX6000 12.5 to check if i see a difference
Thibaud
ThibaudOP4mo ago
i don't understand why i don't see any processes here
No description
nerdylive
nerdylive4mo ago
Huh Does it means in maintanance?
TanegashimaGunsmith
Hi, was this issue solved? I have the same problem with the latest Pytorch and Cuda, as well. I also reset my pod, etc. but CPU is at 100%, GPU utilisation is low, and I have no processes showing up in nvidia-smi
nerdylive
nerdylive3w ago
I think the official vllm works well with the gpu It is using like most of the gpu like normal setup In runpod serverless
TanegashimaGunsmith
What do you mean by the official vllm? I'm installing it via pip I have a pod
nerdylive
nerdylive3w ago
Ohh I thought it's serverless What image do you use It might be, Cuda, package that connects to Cuda If you try other pod and it works, it's because the old pod is bad
TanegashimaGunsmith
I already reset the pod, it doesnt seem to be that runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
nerdylive
nerdylive3w ago
Did you try on another pod? How did you setup vllm btw Install*
TanegashimaGunsmith
pyenv virtualenv, then pip install vllm And yes, I deleted the pod and started a new one Or do you mean a pod with a different pytorch version?
nerdylive
nerdylive3w ago
Oh is it still the same? Another pod create another one
TanegashimaGunsmith
Yes, after deleting I still have this problem
nerdylive
nerdylive3w ago
hmm okay that is not a pod problem then let me try and see Did you load a model yet btw? let me try using jupyter and install vllm i'll share how it goes it seems like it is using gpu but not showing in nvidia-smi
TanegashimaGunsmith
Yes, the same issue with a new pod 100% CPU and ~50% GPU or 100% GPU? I don't see a process there either, neither for lmdeploy but it does show up in nvtop
nerdylive
nerdylive3w ago
Most normal use maximizing gpu use according to the vllm config The performance is normal for the model and gpu right? Like tokens throughput I'm editing a short video showing the performance and gpu usage I can send it tommorow
TanegashimaGunsmith
No, tokens per second is very low for me (10-12 tps) Thanks for the video!
nerdylive
nerdylive3w ago
What llm model?
TanegashimaGunsmith
A100 gpu and Qwen2.5 32B Instruct I'm starting to think it has something to do with the JSON output I'm generating, maybe
nerdylive
nerdylive3w ago
Hmm have you tested in other pods? Maybe the performance will be different Or other cards
TanegashimaGunsmith
I have deleted and recreated it, and it's the sane
nerdylive
nerdylive3w ago
Maybe I never tested performance with or without that Ah it's still slow?
TanegashimaGunsmith
I tried it on an Ada 6000 or so, and it had regular performance which is why I'm surprised On a different hoster
nerdylive
nerdylive3w ago
I used A40 and like 1.5b only, pretty fast Oh what about in runpod ? Is it slower
TanegashimaGunsmith
I didn't use a A6000 on runpod, but on Hetzner In the meantime I have setup LMDeploy and it seems to fully utilise the A100 on runpod So, it might just be vLLM that's broken
nerdylive
nerdylive3w ago
Hmm maybe, or the gpu hardware, ssd is slower on runpod, depending on how the app works
TanegashimaGunsmith
Huh, okay There are issues on Github as well, where people are stuck on 100% CPU but low GPU utilisation No solutions there either
nerdylive
nerdylive3w ago
Oh on vllm? Or vllm-worker Interesting, never checked that Try sglang if you like to I heard it's good too
TanegashimaGunsmith
In the vllm github At least LMDeploy seems to work for now Thank you!
nerdylive
nerdylive3w ago
Your welcome bro
Want results from more Discord servers?
Add your server