Kostya | Matrix One Comments

Kostya | Matrix One

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

Thank you very much, it worked. I have another question. We use two fields in the request to the route /openai/v1/chat/completions: messages and prompt. According to the documentation https://github.com/runpod-workers/worker-vllm?tab=readme-ov-file#chat-completions and the API response, we cannot use these two fields simultaneously. Can we really not use these fields simultaneously in the request, or am I doing something wrong?

21 replies

RRunPod

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

Could you please tell me how to increase VRAM?

21 replies

RRunPod

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

21 replies

RRunPod

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

@nerdylive There are no errors in the logs, only informational logs are being displayed.

21 replies

RRunPod

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

This is very strange, because this model (https://huggingface.co/solidrust/Meta-Llama-3-8B-Instruct-hf-AWQ) works. What is the difference between them and how can I get MythoMax-L2-13B-GPTQ to work?

21 replies

RRunPod

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

@nerdylive @haris Could you please tell me if this model (https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) is compatible?

21 replies

RRunPod

•Created by Kostya | Matrix One on 4/25/2024 in #⚡｜serverless

Can't set up the serverless vLLM for the model.

@Alpay Ariyak @haris 24Gb GPU

21 replies

Gaming

Programming