Chat completion (template) not working with VLLM 0.6.3 + Serverless
I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod.
I ran into the following errors
Client-side
1 Reply
This request runs fine without error:
But this request give me error:
Here's a partial error from server-end:
There isn't any reported error on the Qwen Github regarding the chat template (it uses the SAME template as a model that was released months ago), so i suspect this is a runpod specific error?