RunPod•6mo ago

Terrible performance - vLLM serverless for MIstral 7B

Hello, When I serve Mistral-7B quantized in AWQ using a model such as "TheBloke/Mistral-7B-v0.1-AWQ" in the vLLM serverless instance of runpod, I get terrible performance (accuracy) compared to running Mistral 7B on my CPU using ollama (which uses GGUF quantization and Q4_0), could this be due to a misconfiguration by me in the parameters, although I kept the defaults, or is AWQ quantization known to drop the performance that low? Thank you

0 Replies

No replies yetBe the first to reply to this messageJoin

Gaming

Programming

Terrible performance - vLLM serverless for MIstral 7B

Did you find this page helpful?