What quantization for Cmdr+ using vLLM worker?
I'm trying to set up these Cmdr+ models on serverless using the vllm worker but the only options I see are SqueezeLLM, AWQ and GPTQ. Which quantization to set while starting these models?:
https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit
and
https://huggingface.co/turboderp/command-r-plus-103B-exl2
10 Replies
@aikitoria
There is no support for exl2 in vllm currently
GitHub
ExLlamaV2: exl2 support · Issue #3203 · vllm-project/vllm
If is possible ExLlamaV2 is a very fast and good library to Run LLM ExLlamaV2 Repo
I don't know what quantization method is used for that 4bit one.
Seems to be bitsandbytes according to the config.json
Looks like bitsandbytes is also not supported:
https://github.com/vllm-project/vllm/issues/4033
GitHub
[Feature]: bitsandbytes support · Issue #4033 · vllm-project/vllm
🚀 The feature, motivation and pitch Bitsandbytes 4bit quantization support. I know many want that, and also it is discuused before and marked as unplaned, but after I looked how TGI implemented tha...
@aikitoria said here that vllm was supporting cmdr+ https://discord.com/channels/912829806415085598/948767517332107274/1230643876763537478
Yes, it is supported but only unquant or else supported quant methods, exl2 and bitsandbytes quant versions aren't supported because vllm doesn't support those quant methods
@digigoblin can I use the original
CohereForAI/c4ai-command-r-plus
then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ
but it seems to give some error saying ' CohereForCausalLM is not supported'Strange, CohereForCausalLM is supported by vllm:
https://docs.vllm.ai/en/latest/models/supported_models.html
Maybe @Alpay Ariyak needs to upate the worker or something.
Should be supported since v0.4.0
https://github.com/vllm-project/vllm/issues/3330
GitHub
support for CohereForAI/c4ai-command-r-v01 · Issue #3330 · vllm-pro...
CohereForAI/c4ai-command-r-v01 is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command-R has the capabil...