R
RunPod9mo ago
octopus

What quantization for Cmdr+ using vLLM worker?

I'm trying to set up these Cmdr+ models on serverless using the vllm worker but the only options I see are SqueezeLLM, AWQ and GPTQ. Which quantization to set while starting these models?: https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit and https://huggingface.co/turboderp/command-r-plus-103B-exl2
10 Replies
octopus
octopusOP9mo ago
@aikitoria
digigoblin
digigoblin9mo ago
There is no support for exl2 in vllm currently
digigoblin
digigoblin9mo ago
GitHub
ExLlamaV2: exl2 support · Issue #3203 · vllm-project/vllm
If is possible ExLlamaV2 is a very fast and good library to Run LLM ExLlamaV2 Repo
digigoblin
digigoblin9mo ago
I don't know what quantization method is used for that 4bit one. Seems to be bitsandbytes according to the config.json
digigoblin
digigoblin9mo ago
Looks like bitsandbytes is also not supported: https://github.com/vllm-project/vllm/issues/4033
GitHub
[Feature]: bitsandbytes support · Issue #4033 · vllm-project/vllm
🚀 The feature, motivation and pitch Bitsandbytes 4bit quantization support. I know many want that, and also it is discuused before and marked as unplaned, but after I looked how TGI implemented tha...
octopus
octopusOP9mo ago
digigoblin
digigoblin9mo ago
Yes, it is supported but only unquant or else supported quant methods, exl2 and bitsandbytes quant versions aren't supported because vllm doesn't support those quant methods
octopus
octopusOP9mo ago
@digigoblin can I use the original CohereForAI/c4ai-command-r-plus then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ but it seems to give some error saying ' CohereForCausalLM is not supported'
digigoblin
digigoblin9mo ago
Strange, CohereForCausalLM is supported by vllm: https://docs.vllm.ai/en/latest/models/supported_models.html Maybe @Alpay Ariyak needs to upate the worker or something.
digigoblin
digigoblin9mo ago
Should be supported since v0.4.0 https://github.com/vllm-project/vllm/issues/3330
GitHub
support for CohereForAI/c4ai-command-r-v01 · Issue #3330 · vllm-pro...
CohereForAI/c4ai-command-r-v01 is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command-R has the capabil...

Did you find this page helpful?