Can we add minimum GPU configs required for running the popular models like Mistral, Mixtral?
I'm trying to find what serverless GPU configs are required to run Mixtral 8x7B-Instruct either quantized (https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ) or the main from Mistral. It would be good to have this info in the ReadMe in vLLM Worker Repo.
I run into OutOfMemory issues when trying it on 48GB GPU.
11 Replies
Configs where? I assume this is for the vllm worker?
For main Mixtral, non quant you need at least 2 x A100.
Right using vLLM worker. And do you know about gptq quantized Mixtral? Thanks!
No, don't know, but here is how much you need for the main model:
https://www.youtube.com/watch?v=WjiX3lCnwUI
Matthew Berman
YouTube
Mixtral 8x7B DESTROYS Other Models (MoE = AGI?)
MistralAI is at it again. They've released an MoE (mixture of experts) model that completely dominates the open-source world. Here's a breakdown of what they released, plus an installation guide and an LLM test.
* Sorry for the part where my face gets blurry
Download the EdrawMind for Free:https://bit.ly/46xIp8G and SAVE UP TO 40% discount he...
@Alpay Ariyak may be able to advise since he maintains the vllm worker for RunPod.
@Alpay Ariyak I tried using A100 with 80GB GPU on serverless for quantized Mixtral but still out of memory errors:
GitHub
Cannot run Mixtral 8x7B Instruct AWQ · Issue #49 · runpod-workers/w...
I have successfully been able to run mistral/Mistral-7b-Instruct in both original and quantized (awq) format on runpod serverless using this repo. However, when I try to run Mixtral AWQ, I simply g...
Yes I tried the GPTQ and the AWQ both even the one mentioned by the poster there but it doesn't seem to work on Serverless.
I would recommend using exllama2 for loading up mixtral
In serverless?
Yes, vllm is still super buggy with quantizations and there's no cost effective way of running the full mixtral model
5bit variant using 33gb of vRAM
I have this, but haven't managed to get streaming working:
https://github.com/ashleykleynhans/runpod-worker-exllamav2
GitHub
GitHub - ashleykleynhans/runpod-worker-exllamav2: RunPod Serverless...
RunPod Serverless worker for ExllamaV2. Contribute to ashleykleynhans/runpod-worker-exllamav2 development by creating an account on GitHub.