octopus
RRunPod
•Created by octopus on 6/11/2024 in #⚡|serverless
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
![No description](https://answer-overflow-discord-attachments.s3.us-east-1.amazonaws.com/1250879834297466880/Screen_Shot_2024-06-13_at_2.29.23_PM.png)
8 replies
RRunPod
•Created by octopus on 6/11/2024 in #⚡|serverless
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
this is the model we tried:
https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ
8 replies
RRunPod
•Created by octopus on 6/11/2024 in #⚡|serverless
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
tried that this is the error we get:
8 replies
RRunPod
•Created by octopus on 6/10/2024 in #⚡|serverless
What quantization for Cmdr+ using vLLM worker?
@digigoblin can I use the original
CohereForAI/c4ai-command-r-plus
then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ
but it seems to give some error saying ' CohereForCausalLM is not supported'12 replies
RRunPod
•Created by octopus on 6/10/2024 in #⚡|serverless
What quantization for Cmdr+ using vLLM worker?
@aikitoria said here that vllm was supporting cmdr+ https://discord.com/channels/912829806415085598/948767517332107274/1230643876763537478
12 replies
RRunPod
•Created by octopus on 6/10/2024 in #⚡|serverless
What quantization for Cmdr+ using vLLM worker?
@aikitoria
12 replies
RRunPod
•Created by octopus on 2/29/2024 in #⚡|serverless
Serverless calculating capacity & ideal request count vs. queue delay values
@flash-singh any idea?
4 replies
RRunPod
•Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
Gotta give @ashleyk a job at this point, he helps everyone
46 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
awesome! thanks!
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Cool! yeah the
casperhansen/mixtral-instruct-awq
worked with your settings.48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
It’s the loader I’m not sure abt the quantization
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Exllamav2_HF is not supported?
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
@Alpay Ariyak can you please try with this model:
LoneStriker/Air-Striker-Mixtral-8x7B-Instruct-ZLoss-3.75bpw-h6-exl2
Still getting OOM error for it48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Ohh cool! I’ll try
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
I’m using quantized version though. Also tried with non-Mixtral and it still gave the same error. Is the template working for you for any large models?
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
HF is up now but btw I’m seeing this error for all models not just Mixtral
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
It seems like vllm worker is not working with any of the models. Keeps giving the same OOM error
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
@Alpay Ariyak any updates about this?
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
at least you got it working though! what value did you put? By context you mean adding MAX_SEQUENCE_LENGTH in env vars right?
48 replies
RRunPod
•Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
plz thank you!
48 replies