RunPod•3mo ago

Chat completion (template) not working with VLLM 0.6.3 + Serverless

I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod. I ran into the following errors Client-side

ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message="expected token 'end of print statement', got 'name'", param=None, type='BadRequestError')

ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message="expected token 'end of print statement', got 'name'", param=None, type='BadRequestError')

xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k · Hugging Face

1 Reply

xxxyyyOP•3mo ago

This request runs fine without error:

response = client.completions.create(
    model="xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k",
    prompt="Runpod is the best platform because",
    temperature=0,
    max_tokens=100,
)

response = client.completions.create(
    model="xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k",
    prompt="Runpod is the best platform because",
    temperature=0,
    max_tokens=100,
)

But this request give me error:

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{"role": "user", "content": "Who are you?"}],
    temperature=0,
    max_tokens=100,
)

response = client.chat.completions.create(
    model=MODEL_NAME,
    messages=[{"role": "user", "content": "Who are you?"}],
    temperature=0,
    max_tokens=100,
)

Here's a partial error from server-end:

2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'name'\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "<unknown>", line 27, in template\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] raise rewrite_traceback_stack(source=source)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] self.handle_exception(source=source_hint)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 768, in compile\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]

2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'name'\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "<unknown>", line 27, in template\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] raise rewrite_traceback_stack(source=source)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] self.handle_exception(source=source_hint)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 768, in compile\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]

There isn't any reported error on the Qwen Github regarding the chat template (it uses the SAME template as a model that was released months ago), so i suspect this is a runpod specific error?

Gaming

Programming

Chat completion (template) not working with VLLM 0.6.3 + Serverless

Did you find this page helpful?