VLLM WORKER ERRROR

On fp8 quantization:
12 Replies
nerdylive
nerdyliveOP5mo ago
seems like the vllm isn't updated? i tried using vllm's openai docker image and it works perfectly.. i hope you can check this @Alpay Ariyak
Madiator2011
Madiator20115mo ago
what gpu you use? I think only H100's support fp8
nerdylive
nerdyliveOP5mo ago
No, thats not gpu minimum or compability support its different I think i used RTX 6k or smth like that rtx 4090 works leme try 4090 in serverless okay yeah the same error So this is from the vllm right?
digigoblin
digigoblin5mo ago
Did you log an issue on Github for vllm worker?
nerdylive
nerdyliveOP5mo ago
Not yet
digigoblin
digigoblin5mo ago
RunPod vllm worker is a bit behind the official vllm engine
nerdylive
nerdyliveOP5mo ago
my pr haven't even been reviewed yet ic
digigoblin
digigoblin5mo ago
And vllm engine has added support for Gemma 2 on main branch but not created a release tag for it for example
nerdylive
nerdyliveOP5mo ago
In the vllm worker?
digigoblin
digigoblin5mo ago
No vllm project, not the worker
Alpay Ariyak
Alpay Ariyak5mo ago
Only h100s and L40s support fp8
nerdylive
nerdyliveOP5mo ago
Wait what How am I able to use rtx 4090 then Okay still getting the same error in l40s so will l40 and h100 too I think mine uses a new type of layer in vllm
Want results from more Discord servers?
Add your server