What quantization for Cmdr+ using vLLM worker?

I'm trying to set up these Cmdr+ models on serverless using the vllm worker but the only options I see are SqueezeLLM, AWQ and GPTQ. Which quantization to set while starting these models?: https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit and https://huggingface.co/turboderp/command-r-plus-103B-exl2
10 Replies
octopus
octopus4w ago
@aikitoria
digigoblin
digigoblin4w ago
There is no support for exl2 in vllm currently
digigoblin
digigoblin4w ago
GitHub
ExLlamaV2: exl2 support · Issue #3203 · vllm-project/vllm
If is possible ExLlamaV2 is a very fast and good library to Run LLM ExLlamaV2 Repo
digigoblin
digigoblin4w ago
I don't know what quantization method is used for that 4bit one. Seems to be bitsandbytes according to the config.json
digigoblin
digigoblin4w ago
Looks like bitsandbytes is also not supported: https://github.com/vllm-project/vllm/issues/4033
GitHub
[Feature]: bitsandbytes support · Issue #4033 · vllm-project/vllm
🚀 The feature, motivation and pitch Bitsandbytes 4bit quantization support. I know many want that, and also it is discuused before and marked as unplaned, but after I looked how TGI implemented tha...
digigoblin
digigoblin4w ago
Yes, it is supported but only unquant or else supported quant methods, exl2 and bitsandbytes quant versions aren't supported because vllm doesn't support those quant methods
octopus
octopus4w ago
@digigoblin can I use the original CohereForAI/c4ai-command-r-plus then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ but it seems to give some error saying ' CohereForCausalLM is not supported'
digigoblin
digigoblin4w ago
Strange, CohereForCausalLM is supported by vllm: https://docs.vllm.ai/en/latest/models/supported_models.html Maybe @Alpay Ariyak needs to upate the worker or something.
digigoblin
digigoblin4w ago
Should be supported since v0.4.0 https://github.com/vllm-project/vllm/issues/3330
GitHub
support for CohereForAI/c4ai-command-r-v01 · Issue #3330 · vllm-pro...
CohereForAI/c4ai-command-r-v01 is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command-R has the capabil...