Quantization method

Hello, I am trying to quantize the model, I see several libraries. Do you have any advice on which library is the best? Or they all are fine and I can choose any library
No description
5 Replies
digigoblin
digigoblin6d ago
This does not quantize a model. It allows you to use a model that is already quantized and you specify the quanization format.
annasuhstuff
annasuhstuff6d ago
thank you so much, now i get it 2024-06-27T10:50:05.563358317Z ValueError: Quantization method specified in the model config (bitsandbytes) does not match the quantization method specified in the quantization argument (gptq). now I have this error
digigoblin
digigoblin6d ago
You can't select GPTQ quantization when the model is quantized with bitsandbytes. vllm does not support bitsandbytes quantization method.
digigoblin
digigoblin6d ago
Seems like vllm engine supports it, maybe its just a RunPod vllm worker limitation: https://github.com/vllm-project/vllm/issues/5569
GitHub
[Bug]: BitsandBytes quantization is not working as expected · Issue...
Your current environment $ python collect_env.py Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch...
digigoblin
digigoblin6d ago
Maybe @Alpay Ariyak can advise on this.