GGUF vllm
It seems that the newest version of vllm's supports gguf models, have anyone figured out how to make this work in runpod serverless? Seems like need to set some custom ENV vars, or maybe anyone knows a way to convert gguf back to safetensors?
1 Reply
have you resolved this yet?
try looking in quick deploy settings or vllm documentations then match it with the quick deploy settings