Kostya | Matrix One
Kostya | Matrix One
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
Thank you very much, it worked. I have another question. We use two fields in the request to the route /openai/v1/chat/completions: messages and prompt. According to the documentation https://github.com/runpod-workers/worker-vllm?tab=readme-ov-file#chat-completions and the API response, we cannot use these two fields simultaneously. Can we really not use these fields simultaneously in the request, or am I doing something wrong?
21 replies
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
Could you please tell me how to increase VRAM?
21 replies
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
21 replies
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
@nerdylive There are no errors in the logs, only informational logs are being displayed.
21 replies
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
This is very strange, because this model (https://huggingface.co/solidrust/Meta-Llama-3-8B-Instruct-hf-AWQ) works. What is the difference between them and how can I get MythoMax-L2-13B-GPTQ to work?
21 replies
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
@nerdylive @haris Could you please tell me if this model (https://huggingface.co/TheBloke/MythoMax-L2-13B-GPTQ) is compatible?
21 replies
RRunPod
Created by Kostya | Matrix One on 4/25/2024 in #⚡|serverless
Can't set up the serverless vLLM for the model.
@Alpay Ariyak @haris 24Gb GPU
21 replies