RunPod•8mo ago

GGUF vllm

It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?

11 Replies

Jason•8mo ago

have you resolved this yet? try looking in quick deploy settings or vllm documentations then match it with the quick deploy settings

Misterion•5mo ago

hi, is there any solution to this?

Jason•5mo ago

let me check again have you tried loading gguf model's normally with default values i think it works just like that

Jason•5mo ago

but..

Misterion•5mo ago

the problem is that you have to specify gguf file name, and belive there is no such env var for vllm worker we could download the model and pack it in the container, but I was just looking for out-of-the-box solution

Jason•5mo ago

oh wait let me check the code

Jason•5mo ago

Oh yeah i think maybe you need t o build your own container,

Misterion•5mo ago

will do that as a workaround, but would be nice to support that natively

Jason•5mo ago

yup

wiki•5mo ago

I will add a support for this natively

Jason•5mo ago

Nice thanks!

Gaming

Programming

GGUF vllm

Did you find this page helpful?