md Comments - Answer Overflow

Posts Comments

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

nice this will be useful thanks alot

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

i will revisit this in the future

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

I see yeah that makes sense

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

even with using dtype half ? we need 4x80 gb ?

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

my bad guys

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

Yeah your actually right, i confused it as 80gb

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

i just selected the option 2 gpu per worker and 80gb H100

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

yeah i will try soon

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

Hey, yes I used 2x 80 GB GPU per worker with 3 workers but I got an error torch.cuda ran out of memory while trying to allocate

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

Thanks alot

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

even with the above config

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

no i ran out of memory

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

i used 2 gpu per worker as well actually 80gb

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

i set it to 3 but still ran out of memory

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

let me check

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

https://docs.mistral.ai/deployment/self-deployment/vllm/ in this guide they set the tensor parallel size to 4 i wonder if runpod does it as well

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

yeah

126 replies

RRunPod

•Created by md on 5/12/2024 in #⚡｜serverless

Run Mixtral 8x22B Instruct on vLLM worker

i havent looked into it yet but they suggested it and TGI

126 replies

Gaming

Programming