12 Replies
seems like the vllm isn't updated? i tried using vllm's openai docker image and it works perfectly..
i hope you can check this @Alpay Ariyak
what gpu you use?
I think only H100's support fp8
No, thats not gpu minimum or compability support
its different
I think i used RTX 6k or smth like that
rtx 4090 works
leme try 4090 in serverless
okay yeah the same error
So this is from the vllm right?
Did you log an issue on Github for vllm worker?
Not yet
RunPod vllm worker is a bit behind the official vllm engine
my pr haven't even been reviewed yet
ic
And vllm engine has added support for Gemma 2 on main branch but not created a release tag for it for example
In the vllm worker?
No vllm project, not the worker
Only h100s and L40s support fp8
Wait what
How am I able to use rtx 4090 then
Okay still getting the same error in l40s so will l40 and h100 too
I think mine uses a new type of layer in vllm