vllm seems not use GPU
i'm using vllm
and on the graph, when i launch some request, only cpu usage increase.
if i open a terminal and launch nvidia-smi, i didn't see any process too.
settings line
--model NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 8192 --port 8000 --dtype half --enable-chunked-prefill true --max-num-batched-tokens 6144 --gpu-memory-utilization 0.97
11 Replies
Try another pod?
select the cuda version to the right one
i tried on 4 different pod.
for cuda version, i don't know where i can set it
btw any logs?
not set, more like filter when you create a pod
i m trying pod not serverless.
i don't see where in pod i can filter cuda
ez
just try 12.5
thanks!
try 12.4 if not
just checked
i used A40 so 12.4
i ll try with RTX6000 12.5 to check if i see a difference
i don't understand why i don't see any processes here
Huh
Does it means in maintanance?