Concept
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
its from a competitor so i'm going to hold off from posting
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
I'll dm you the tutorial
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Took vllm completely out of the equation.
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
I feel like the devs should put out a tutorial for loading mixtral use cases. Lots of people seem to be having trouble with it.
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Can we add minimum GPU configs required for running the popular models like Mistral, Mixtral?
5bit variant using 33gb of vRAM
15 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Can we add minimum GPU configs required for running the popular models like Mistral, Mixtral?
Yes, vllm is still super buggy with quantizations and there's no cost effective way of running the full mixtral model
15 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Can we add minimum GPU configs required for running the popular models like Mistral, Mixtral?
I would recommend using exllama2 for loading up mixtral
15 replies
RRunPod
•Created by Concept on 2/1/2024 in #⚡|serverless
VLLM Worker Error that doesn't time out.
using runpod vllm
10 replies
RRunPod
•Created by Concept on 2/1/2024 in #⚡|serverless
VLLM Worker Error that doesn't time out.
Existing worker on the newest SDK. I believe it was a JSON serialization error, which would be an error on my side but it shouldn't keep on running like that after erroring.
10 replies
RRunPod
•Created by Concept on 2/1/2024 in #⚡|serverless
VLLM Worker Error that doesn't time out.
10 replies
RRunPod
•Created by Concept on 2/1/2024 in #⚡|serverless
VLLM Worker Error that doesn't time out.
IS there a way to kill workers when they error?
10 replies
RRunPod
•Created by Concept on 1/15/2024 in #⚡|serverless
Request Format Runpod VLLM Worker
I’m not too sure if there’s a difference
11 replies
RRunPod
•Created by Concept on 1/15/2024 in #⚡|serverless
Request Format Runpod VLLM Worker
const requestBody = {
input: {
prompt: chatHistory,
sampling_params: {
max_tokens: 2000,
},
apply_chat_template: true,
stream: true,
},
};
This worked for me
11 replies
RRunPod
•Created by Concept on 1/20/2024 in #⚡|serverless
Empty Tokens Using Mixtral AWQ
@Alpay Ariyak
6 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
Will look into it thank you.
69 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
So the reason why I'm trying to use Mixtral is the use of experts and also its context window.
I'm open to using OpenChat, would it be possible to increase the context size from 8k or is that set?
@Justin
69 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
69 replies
RRunPod
•Created by wizardjoe on 1/4/2024 in #⚡|serverless
Error building worker-vllm docker image for mixtral 8x7b
2024-01-19T20:28:00.200082421Z INFO 01-19 20:28:00 llm_engine.py:70] Initializing an LLM engine with config: model='TheBloke/mixtral-8x7b-v0.1-AWQ', tokenizer='TheBloke/mixtral-8x7b-v0.1-AWQ', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=32768, download_dir='/models', load_format=auto, tensor_parallel_size=1, quantization=awq, enforce_eager=False, seed=0)
This log is taking the most time. I'm stuck here for about 2-3minutes
69 replies