R
RunPod7mo ago
artbred

GGUF vllm

It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?
11 Replies
Jason
Jason7mo ago
have you resolved this yet? try looking in quick deploy settings or vllm documentations then match it with the quick deploy settings
Misterion
Misterion4mo ago
hi, is there any solution to this?
Jason
Jason4mo ago
let me check again have you tried loading gguf model's normally with default values i think it works just like that
Jason
Jason4mo ago
but..
No description
Misterion
Misterion4mo ago
the problem is that you have to specify gguf file name, and belive there is no such env var for vllm worker we could download the model and pack it in the container, but I was just looking for out-of-the-box solution
Jason
Jason4mo ago
oh wait let me check the code
Jason
Jason4mo ago
Oh yeah i think maybe you need t o build your own container,
No description
Misterion
Misterion4mo ago
will do that as a workaround, but would be nice to support that natively
Jason
Jason4mo ago
yup
wiki
wiki4mo ago
I will add a support for this natively
Jason
Jason4mo ago
Nice thanks!

Did you find this page helpful?