octopus
octopus
RRunPod
Created by octopus on 6/11/2024 in #⚡|serverless
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
No description
8 replies
RRunPod
Created by octopus on 6/11/2024 in #⚡|serverless
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
this is the model we tried: https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ
8 replies
RRunPod
Created by octopus on 6/11/2024 in #⚡|serverless
Cannot run Cmdr+ on serverless, CohereForCausalLM not supported
tried that this is the error we get:
return future.result()/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 370, in _load_chat_template
2024-06-12T04:16:52.930112985Z [rank0]: with open(chat_template, "r") as f:
2024-06-12T04:16:52.930128535Z [rank0]: TypeError: expected str, bytes or os.PathLike object, not dict
return future.result()/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 370, in _load_chat_template
2024-06-12T04:16:52.930112985Z [rank0]: with open(chat_template, "r") as f:
2024-06-12T04:16:52.930128535Z [rank0]: TypeError: expected str, bytes or os.PathLike object, not dict
8 replies
RRunPod
Created by octopus on 6/10/2024 in #⚡|serverless
What quantization for Cmdr+ using vLLM worker?
@digigoblin can I use the original CohereForAI/c4ai-command-r-plus then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ but it seems to give some error saying ' CohereForCausalLM is not supported'
12 replies
RRunPod
Created by octopus on 6/10/2024 in #⚡|serverless
What quantization for Cmdr+ using vLLM worker?
12 replies
RRunPod
Created by octopus on 6/10/2024 in #⚡|serverless
What quantization for Cmdr+ using vLLM worker?
@aikitoria
12 replies
RRunPod
Created by octopus on 2/29/2024 in #⚡|serverless
Serverless calculating capacity & ideal request count vs. queue delay values
@flash-singh any idea?
4 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
Gotta give @ashleyk a job at this point, he helps everyone
46 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
awesome! thanks!
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Cool! yeah the casperhansen/mixtral-instruct-awq worked with your settings.
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
It’s the loader I’m not sure abt the quantization
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Exllamav2_HF is not supported?
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
@Alpay Ariyak can you please try with this model: LoneStriker/Air-Striker-Mixtral-8x7B-Instruct-ZLoss-3.75bpw-h6-exl2 Still getting OOM error for it
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
Ohh cool! I’ll try
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
I’m using quantized version though. Also tried with non-Mixtral and it still gave the same error. Is the template working for you for any large models?
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
HF is up now but btw I’m seeing this error for all models not just Mixtral
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
It seems like vllm worker is not working with any of the models. Keeps giving the same OOM error
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
@Alpay Ariyak any updates about this?
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
at least you got it working though! what value did you put? By context you mean adding MAX_SEQUENCE_LENGTH in env vars right?
48 replies
RRunPod
Created by octopus on 2/26/2024 in #⚡|serverless
Help: Serverless Mixtral OutOfMemory Error
plz thank you!
48 replies