xxxyyy
RRunPod
•Created by xxxyyy on 11/11/2024 in #⚡|serverless
Chat completion (template) not working with VLLM 0.6.3 + Serverless
I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod.
I ran into the following errors
Client-side
4 replies