RunPod•10mo ago

VLLM WORKER ERRROR

On fp8 quantization:

message.txt

12 Replies

JasonOP•10mo ago

seems like the vllm isn't updated? i tried using vllm's openai docker image and it works perfectly.. i hope you can check this @Alpay Ariyak

Madiator2011•10mo ago

what gpu you use? I think only H100's support fp8

JasonOP•10mo ago

No, thats not gpu minimum or compability support its different I think i used RTX 6k or smth like that rtx 4090 works leme try 4090 in serverless okay yeah the same error So this is from the vllm right?

digigoblin•10mo ago

Did you log an issue on Github for vllm worker?

JasonOP•10mo ago

Not yet

digigoblin•10mo ago

RunPod vllm worker is a bit behind the official vllm engine

JasonOP•10mo ago

my pr haven't even been reviewed yet ic

digigoblin•10mo ago

And vllm engine has added support for Gemma 2 on main branch but not created a release tag for it for example

JasonOP•10mo ago

In the vllm worker?

digigoblin•10mo ago

No vllm project, not the worker

Alpay Ariyak•10mo ago

Only h100s and L40s support fp8

JasonOP•10mo ago

Wait what How am I able to use rtx 4090 then Okay still getting the same error in l40s so will l40 and h100 too I think mine uses a new type of layer in vllm

Gaming

Programming

VLLM WORKER ERRROR

Did you find this page helpful?