vllm seems not use GPU
i'm using vllm
and on the graph, when i launch some request, only cpu usage increase.
if i open a terminal and launch nvidia-smi, i didn't see any process too.
settings line
--model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97
38 Replies
Try another pod?
select the cuda version to the right one
i tried on 4 different pod.
for cuda version, i don't know where i can set it
btw any logs?
not set, more like filter when you create a pod
i m trying pod not serverless.
i don't see where in pod i can filter cuda
ez
just try 12.5
thanks!
try 12.4 if not
just checked
i used A40 so 12.4
i ll try with RTX6000 12.5 to check if i see a difference
i don't understand why i don't see any processes here
Huh
Does it means in maintanance?
Hi, was this issue solved? I have the same problem with the latest Pytorch and Cuda, as well. I also reset my pod, etc. but CPU is at 100%, GPU utilisation is low, and I have no processes showing up in nvidia-smi
I think the official vllm works well with the gpu
It is using like most of the gpu like normal setup
In runpod serverless
What do you mean by the official vllm? I'm installing it via pip
I have a pod
Ohh I thought it's serverless
What image do you use
It might be, Cuda, package that connects to Cuda
If you try other pod and it works, it's because the old pod is bad
I already reset the pod, it doesnt seem to be that
runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04
Did you try on another pod?
How did you setup vllm btw
Install*
pyenv virtualenv, then pip install vllm
And yes, I deleted the pod and started a new one
Or do you mean a pod with a different pytorch version?
Oh is it still the same?
Another pod create another one
Yes, after deleting I still have this problem
hmm okay that is not a pod problem then
let me try and see
Did you load a model yet btw?
let me try using jupyter and install vllm i'll share how it goes
it seems like it is using gpu
but not showing in nvidia-smi
Yes, the same issue with a new pod
100% CPU and ~50% GPU or 100% GPU?
I don't see a process there either, neither for lmdeploy
but it does show up in nvtop
Most normal use maximizing gpu use according to the vllm config
The performance is normal for the model and gpu right?
Like tokens throughput
I'm editing a short video showing the performance and gpu usage I can send it tommorow
No, tokens per second is very low for me (10-12 tps)
Thanks for the video!
What llm model?
A100 gpu and Qwen2.5 32B Instruct
I'm starting to think it has something to do with the JSON output I'm generating, maybe
Hmm have you tested in other pods?
Maybe the performance will be different
Or other cards
I have deleted and recreated it, and it's the sane
Maybe I never tested performance with or without that
Ah it's still slow?
I tried it on an Ada 6000 or so, and it had regular performance which is why I'm surprised
On a different hoster
I used A40 and like 1.5b only, pretty fast
Oh what about in runpod ?
Is it slower
I didn't use a A6000 on runpod, but on Hetzner
In the meantime I have setup LMDeploy and it seems to fully utilise the A100 on runpod
So, it might just be vLLM that's broken
Hmm maybe, or the gpu hardware, ssd is slower on runpod, depending on how the app works
Huh, okay
There are issues on Github as well, where people are stuck on 100% CPU but low GPU utilisation
No solutions there either
Oh on vllm? Or vllm-worker
Interesting, never checked that
Try sglang if you like to
I heard it's good too
In the vllm github
At least LMDeploy seems to work for now
Thank you!
Your welcome bro