octopus Comments - Answer Overflow

Topics

octopus

•Created by ditti on 9/11/2024 in #⚡｜serverless

Terminating local vLLM process while loading safetensor checkpoints

you mean cuz we have network volume attached?

10 replies

•Created by ditti on 9/11/2024 in #⚡｜serverless

Terminating local vLLM process while loading safetensor checkpoints

@ditti were you able to find a solution to this?

10 replies

•Created by octopus on 11/13/2024 in #⚡｜serverless

What is the real Serverless price?

cuz I'm not sure how much I'm paying. Also I thought runpod pricing was cheapest out there but then this ad from novita.ai showed up saying it is 50% cheaper than runpod https://novita.ai/gpu-instance/console/serverless

17 replies

•Created by octopus on 11/13/2024 in #⚡｜serverless

What is the real Serverless price?

I did but shouldn't that reflect in the price?

17 replies

•Created by octopus on 11/13/2024 in #⚡｜serverless

What is the real Serverless price?

so when is the 30% off applied?

17 replies

•Created by octopus on 11/13/2024 in #⚡｜serverless

What is the real Serverless price?

I sent you above.. if you see in the bottom right

17 replies

•Created by octopus on 11/13/2024 in #⚡｜serverless

What is the real Serverless price?

but the price on the main page shows $0.00046/s in the first screenshot

17 replies

•Created by octopus on 11/13/2024 in #⚡｜serverless

What is the real Serverless price?

No description

17 replies

•Created by Thibaud on 8/8/2024 in #⚡｜serverless

can't run 70b

@Thibaud were you able to get the execution time lowered? I compared mlabonne/Llama-3.1-70B-Instruct-lorablated with Llama-70B-3.0 (https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5) which is the original that 3.1 is based on and the difference is striking. 3-5secs for 3.1 vs only 0.6-0.8secs for 3.0

75 replies

•Created by houmie on 6/28/2024 in #⚡｜serverless

vLLM serverless throws 502 errors

I'm getting this error too for vllm. Did anyone find a solution? About 5% of requests end up getting failed with this error

11 replies

•Created by octopus on 6/11/2024 in #⚡｜serverless

Cannot run Cmdr+ on serverless, CohereForCausalLM not supported

No description

8 replies

•Created by octopus on 6/11/2024 in #⚡｜serverless

Cannot run Cmdr+ on serverless, CohereForCausalLM not supported

this is the model we tried: https://huggingface.co/alpindale/c4ai-command-r-plus-GPTQ

8 replies

•Created by octopus on 6/11/2024 in #⚡｜serverless

Cannot run Cmdr+ on serverless, CohereForCausalLM not supported

tried that this is the error we get:

return future.result()/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 370, in _load_chat_template
2024-06-12T04:16:52.930112985Z [rank0]:     with open(chat_template, "r") as f:
2024-06-12T04:16:52.930128535Z [rank0]: TypeError: expected str, bytes or os.PathLike object, not dict

return future.result()/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 370, in _load_chat_template
2024-06-12T04:16:52.930112985Z [rank0]:     with open(chat_template, "r") as f:
2024-06-12T04:16:52.930128535Z [rank0]: TypeError: expected str, bytes or os.PathLike object, not dict

8 replies

•Created by octopus on 6/10/2024 in #⚡｜serverless

What quantization for Cmdr+ using vLLM worker?

@digigoblin can I use the original CohereForAI/c4ai-command-r-plus then? what parameter values should I input and vRAM GPU is needed to run it? alternately I tried this alpindale/c4ai-command-r-plus-GPTQ but it seems to give some error saying ' CohereForCausalLM is not supported'

12 replies

•Created by octopus on 6/10/2024 in #⚡｜serverless

What quantization for Cmdr+ using vLLM worker?

@aikitoria said here that vllm was supporting cmdr+ https://discord.com/channels/912829806415085598/948767517332107274/1230643876763537478

12 replies

•Created by octopus on 6/10/2024 in #⚡｜serverless

What quantization for Cmdr+ using vLLM worker?

@aikitoria

12 replies

•Created by octopus on 2/29/2024 in #⚡｜serverless

Serverless calculating capacity & ideal request count vs. queue delay values

@flash-singh any idea?

4 replies

•Created by ashleyk on 2/26/2024 in #⚡｜serverless

Unacceptably high failed jobs suddenly

Gotta give @ashleyk a job at this point, he helps everyone

46 replies

•Created by octopus on 2/26/2024 in #⚡｜serverless

Help: Serverless Mixtral OutOfMemory Error

awesome! thanks!

48 replies

•Created by octopus on 2/26/2024 in #⚡｜serverless

Help: Serverless Mixtral OutOfMemory Error

Cool! yeah the casperhansen/mixtral-instruct-awq worked with your settings.

48 replies