Quantization method
Hello, I am trying to quantize the model, I see several libraries. Do you have any advice on which library is the best? Or they all are fine and I can choose any library
5 Replies
This does not quantize a model. It allows you to use a model that is already quantized and you specify the quanization format.
thank you so much, now i get it
2024-06-27T10:50:05.563358317Z ValueError: Quantization method specified in the model config (bitsandbytes) does not match the quantization method specified in the
quantization
argument (gptq).
now I have this errorYou can't select GPTQ quantization when the model is quantized with bitsandbytes.
vllm does not support bitsandbytes quantization method.
Seems like vllm engine supports it, maybe its just a RunPod vllm worker limitation:
https://github.com/vllm-project/vllm/issues/5569
GitHub
[Bug]: BitsandBytes quantization is not working as expected · Issue...
Your current environment $ python collect_env.py Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch...
Maybe @Alpay Ariyak can advise on this.