Does VLLM support quantized models?
Trying to figure out how to deploy this, but I didn't see an option for selecting which quantization I wanted to run. https://huggingface.co/bartowski/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF Thanks!
1 Reply
I tried doing download directory to the quant model, but I see that the model says "Using llama.cpp release b3496 for quantization." and I don't see that as an option on runpod for the quantization method