GGUF vllm
It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?
11 Replies
have you resolved this yet?
try looking in quick deploy settings or vllm documentations then match it with the quick deploy settings
hi, is there any solution to this?
let me check again
have you tried loading gguf model's normally with default values
i think it works just like that
but..
the problem is that you have to specify gguf file name, and belive there is no such env var for vllm worker
we could download the model and pack it in the container, but I was just looking for out-of-the-box solution
oh wait let me check the code
Oh yeah i think maybe you need t o build your own container,
will do that as a workaround, but would be nice to support that natively
yup
I will add a support for this natively
Nice thanks!