RunPod•9mo ago

vllm seems not use GPU

i'm using vllm and on the graph, when i launch some request, only cpu usage increase. if i open a terminal and launch nvidia-smi, i didn't see any process too. settings line --model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97

38 Replies

Jason•9mo ago

Try another pod? select the cuda version to the right one

ThibaudOP•9mo ago

i tried on 4 different pod. for cuda version, i don't know where i can set it

Jason•9mo ago

btw any logs? not set, more like filter when you create a pod

ThibaudOP•9mo ago

i m trying pod not serverless. i don't see where in pod i can filter cuda

Jason•9mo ago

ez just try 12.5

ThibaudOP•9mo ago

thanks!

Jason•9mo ago

try 12.4 if not just checked

ThibaudOP•9mo ago

i used A40 so 12.4 i ll try with RTX6000 12.5 to check if i see a difference

ThibaudOP•9mo ago

i don't understand why i don't see any processes here

Jason•9mo ago

Huh Does it means in maintanance?

TanegashimaGunsmith•6mo ago

Hi, was this issue solved? I have the same problem with the latest Pytorch and Cuda, as well. I also reset my pod, etc. but CPU is at 100%, GPU utilisation is low, and I have no processes showing up in nvidia-smi

Jason•6mo ago

I think the official vllm works well with the gpu It is using like most of the gpu like normal setup In runpod serverless

TanegashimaGunsmith•6mo ago

What do you mean by the official vllm? I'm installing it via pip I have a pod

Jason•6mo ago

Ohh I thought it's serverless What image do you use It might be, Cuda, package that connects to Cuda If you try other pod and it works, it's because the old pod is bad

TanegashimaGunsmith•6mo ago

I already reset the pod, it doesnt seem to be that runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04

Jason•6mo ago

Did you try on another pod? How did you setup vllm btw Install*

TanegashimaGunsmith•6mo ago

pyenv virtualenv, then pip install vllm And yes, I deleted the pod and started a new one Or do you mean a pod with a different pytorch version?

Jason•6mo ago

Oh is it still the same? Another pod create another one

TanegashimaGunsmith•6mo ago

Yes, after deleting I still have this problem

Jason•6mo ago

hmm okay that is not a pod problem then let me try and see Did you load a model yet btw? let me try using jupyter and install vllm i'll share how it goes it seems like it is using gpu but not showing in nvidia-smi

TanegashimaGunsmith•6mo ago

Yes, the same issue with a new pod 100% CPU and ~50% GPU or 100% GPU? I don't see a process there either, neither for lmdeploy but it does show up in nvtop

Jason•6mo ago

Most normal use maximizing gpu use according to the vllm config The performance is normal for the model and gpu right? Like tokens throughput I'm editing a short video showing the performance and gpu usage I can send it tommorow

Jason•6mo ago

https://youtu.be/ecJP8YgpgdI

Nerdy Live

YouTube

VLLM Demo in pod

TanegashimaGunsmith•6mo ago

No, tokens per second is very low for me (10-12 tps) Thanks for the video!

Jason•6mo ago

What llm model?

TanegashimaGunsmith•6mo ago

A100 gpu and Qwen2.5 32B Instruct I'm starting to think it has something to do with the JSON output I'm generating, maybe

Jason•6mo ago

Hmm have you tested in other pods? Maybe the performance will be different Or other cards

TanegashimaGunsmith•6mo ago

I have deleted and recreated it, and it's the sane

Jason•6mo ago

Maybe I never tested performance with or without that Ah it's still slow?

TanegashimaGunsmith•6mo ago

I tried it on an Ada 6000 or so, and it had regular performance which is why I'm surprised On a different hoster

Jason•6mo ago

I used A40 and like 1.5b only, pretty fast Oh what about in runpod ? Is it slower

TanegashimaGunsmith•6mo ago

I didn't use a A6000 on runpod, but on Hetzner In the meantime I have setup LMDeploy and it seems to fully utilise the A100 on runpod So, it might just be vLLM that's broken

Jason•6mo ago

Hmm maybe, or the gpu hardware, ssd is slower on runpod, depending on how the app works

TanegashimaGunsmith•6mo ago

Huh, okay There are issues on Github as well, where people are stuck on 100% CPU but low GPU utilisation No solutions there either

Jason•6mo ago

Oh on vllm? Or vllm-worker Interesting, never checked that Try sglang if you like to I heard it's good too

TanegashimaGunsmith•6mo ago

In the vllm github At least LMDeploy seems to work for now Thank you!

Jason•6mo ago

Your welcome bro

Gaming

Programming

vllm seems not use GPU

Did you find this page helpful?