R
RunPod5mo ago
artbred

GGUF vllm

It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?
11 Replies
nerdylive
nerdylive5mo ago
have you resolved this yet? try looking in quick deploy settings or vllm documentations then match it with the quick deploy settings
Misterion
Misterion2mo ago
hi, is there any solution to this?
nerdylive
nerdylive2mo ago
let me check again have you tried loading gguf model's normally with default values i think it works just like that
nerdylive
nerdylive2mo ago
but..
No description
Misterion
Misterion2mo ago
the problem is that you have to specify gguf file name, and belive there is no such env var for vllm worker we could download the model and pack it in the container, but I was just looking for out-of-the-box solution
nerdylive
nerdylive2mo ago
oh wait let me check the code
nerdylive
nerdylive2mo ago
Oh yeah i think maybe you need t o build your own container,
No description
Misterion
Misterion2mo ago
will do that as a workaround, but would be nice to support that natively
nerdylive
nerdylive2mo ago
yup
wiki
wiki2mo ago
I will add a support for this natively
nerdylive
nerdylive2mo ago
Nice thanks!

Did you find this page helpful?