RunPod•4mo ago

Does VLLM support quantized models?

Trying to figure out how to deploy this, but I didn't see an option for selecting which quantization I wanted to run. https://huggingface.co/bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF Thanks!

bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF · Hugg...

1 Reply

StandingFutureOP•4mo ago

I tried doing download directory to the quant model, but I see that the model says "Using llama.cpp release b3496 for quantization." and I don't see that as an option on runpod for the quantization method

Gaming

Programming

Does VLLM support quantized models?

Did you find this page helpful?