StandingFuture
RRunPod
•Created by StandingFuture on 10/25/2024 in #⚡|serverless
Does VLLM support quantized models?
I tried doing download directory to the quant model, but I see that the model says "Using llama.cpp release b3496 for quantization." and I don't see that as an option on runpod for the quantization method
2 replies