RunPod•10mo ago

Quantization method

Hello, I am trying to quantize the model, I see several libraries. Do you have any advice on which library is the best? Or they all are fine and I can choose any library

5 Replies

digigoblin•10mo ago

This does not quantize a model. It allows you to use a model that is already quantized and you specify the quanization format.

annasuhstuffOP•10mo ago

thank you so much, now i get it 2024-06-27T10:50:05.563358317Z ValueError: Quantization method specified in the model config (bitsandbytes) does not match the quantization method specified in the quantization argument (gptq). now I have this error

digigoblin•10mo ago

You can't select GPTQ quantization when the model is quantized with bitsandbytes. vllm does not support bitsandbytes quantization method.

digigoblin•10mo ago

Seems like vllm engine supports it, maybe its just a RunPod vllm worker limitation: https://github.com/vllm-project/vllm/issues/5569

GitHub

[Bug]: BitsandBytes quantization is not working as expected · Issue...

Your current environment $ python collect_env.py Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch...

digigoblin•10mo ago

Maybe @Alpay Ariyak can advise on this.

Gaming

Programming

Quantization method

Did you find this page helpful?