R
RunPod10mo ago
Jason

VLLM WORKER ERRROR

On fp8 quantization:
12 Replies
Jason
JasonOP10mo ago
seems like the vllm isn't updated? i tried using vllm's openai docker image and it works perfectly.. i hope you can check this @Alpay Ariyak
Madiator2011
Madiator201110mo ago
what gpu you use? I think only H100's support fp8
Jason
JasonOP10mo ago
No, thats not gpu minimum or compability support its different I think i used RTX 6k or smth like that rtx 4090 works leme try 4090 in serverless okay yeah the same error So this is from the vllm right?
digigoblin
digigoblin10mo ago
Did you log an issue on Github for vllm worker?
Jason
JasonOP10mo ago
Not yet
digigoblin
digigoblin10mo ago
RunPod vllm worker is a bit behind the official vllm engine
Jason
JasonOP10mo ago
my pr haven't even been reviewed yet ic
digigoblin
digigoblin10mo ago
And vllm engine has added support for Gemma 2 on main branch but not created a release tag for it for example
Jason
JasonOP10mo ago
In the vllm worker?
digigoblin
digigoblin10mo ago
No vllm project, not the worker
Alpay Ariyak
Alpay Ariyak10mo ago
Only h100s and L40s support fp8
Jason
JasonOP10mo ago
Wait what How am I able to use rtx 4090 then Okay still getting the same error in l40s so will l40 and h100 too I think mine uses a new type of layer in vllm

Did you find this page helpful?