RunPod•11mo ago

What quantization for Cmdr+ using vLLM worker?

I'm trying to set up these Cmdr+ models on serverless using the vllm worker but the only options I see are SqueezeLLM, AWQ and GPTQ. Which quantization to set while starting these models?: https://huggingface.co/CohereForAI/c4ai-command-r-plus-4bit and https://huggingface.co/turboderp/command-r-plus-103B-exl2

CohereForAI/c4ai-command-r-plus-4bit · Hugging Face

turboderp/command-r-plus-103B-exl2 · Hugging Face

10 Replies

octopusOP•11mo ago

@aikitoria

digigoblin•11mo ago

There is no support for exl2 in vllm currently

digigoblin•11mo ago

https://github.com/vllm-project/vllm/issues/3203

GitHub

ExLlamaV2: exl2 support · Issue #3203 · vllm-project/vllm

If is possible ExLlamaV2 is a very fast and good library to Run LLM ExLlamaV2 Repo

digigoblin•11mo ago

I don't know what quantization method is used for that 4bit one. Seems to be bitsandbytes according to the config.json

digigoblin•11mo ago

Looks like bitsandbytes is also not supported: https://github.com/vllm-project/vllm/issues/4033

GitHub

[Feature]: bitsandbytes support · Issue #4033 · vllm-project/vllm

🚀 The feature, motivation and pitch Bitsandbytes 4bit quantization support. I know many want that, and also it is discuused before and marked as unplaned, but after I looked how TGI implemented tha...

octopusOP•11mo ago

@aikitoria said here that vllm was supporting cmdr+ https://discord.com/channels/912829806415085598/948767517332107274/1230643876763537478

digigoblin•11mo ago

Yes, it is supported but only unquant or else supported quant methods, exl2 and bitsandbytes quant versions aren't supported because vllm doesn't support those quant methods

octopusOP•11mo ago

@digigoblin can I use the original CohereForAI/c4ai-command-r-plus then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ but it seems to give some error saying ' CohereForCausalLM is not supported'

digigoblin•11mo ago

Strange, CohereForCausalLM is supported by vllm: https://docs.vllm.ai/en/latest/models/supported_models.html Maybe @Alpay Ariyak needs to upate the worker or something.

digigoblin•11mo ago

Should be supported since v0.4.0 https://github.com/vllm-project/vllm/issues/3330

GitHub

support for CohereForAI/c4ai-command-r-v01 · Issue #3330 · vllm-pro...

CohereForAI/c4ai-command-r-v01 is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question answering. Command-R has the capabil...

Gaming

Programming

What quantization for Cmdr+ using vLLM worker?

Did you find this page helpful?