md
md
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
nice this will be useful thanks alot
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
i will revisit this in the future
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
I see yeah that makes sense
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
even with using dtype half ? we need 4x80 gb ?
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
my bad guys
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
Yeah your actually right, i confused it as 80gb
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
i just selected the option 2 gpu per worker and 80gb H100
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
yeah i will try soon
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
Hey, yes I used 2x 80 GB GPU per worker with 3 workers but I got an error torch.cuda ran out of memory while trying to allocate
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
Thanks alot
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
even with the above config
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
no i ran out of memory
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
i used 2 gpu per worker as well actually 80gb
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
i set it to 3 but still ran out of memory
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
No description
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
let me check
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
https://docs.mistral.ai/deployment/self-deployment/vllm/ in this guide they set the tensor parallel size to 4 i wonder if runpod does it as well
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
No description
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
yeah
126 replies
RRunPod
Created by md on 5/12/2024 in #⚡|serverless
Run Mixtral 8x22B Instruct on vLLM worker
i havent looked into it yet but they suggested it and TGI
126 replies