R
RunPod2mo ago
xxxyyy

Chat completion (template) not working with VLLM 0.6.3 + Serverless

I deployed https://huggingface.co/xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k model through the Serverless UI, setting max model context window to 129024 and quantization to awq. I deploy it using the lastest version of vllm (0.6.3) provided by runpod. I ran into the following errors Client-side
ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message="expected token 'end of print statement', got 'name'", param=None, type='BadRequestError')
ChatCompletion(id=None, choices=None, created=None, model=None, object='error', service_tier=None, system_fingerprint=None, usage=None, code=400, message="expected token 'end of print statement', got 'name'", param=None, type='BadRequestError')
1 Reply
xxxyyy
xxxyyyOP2mo ago
This request runs fine without error:
response = client.completions.create(
model="xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k",
prompt="Runpod is the best platform because",
temperature=0,
max_tokens=100,
)
response = client.completions.create(
model="xingyaoww/Qwen2.5-Coder-32B-Instruct-AWQ-128k",
prompt="Runpod is the best platform because",
temperature=0,
max_tokens=100,
)
But this request give me error:
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[{"role": "user", "content": "Who are you?"}],
temperature=0,
max_tokens=100,
)
response = client.chat.completions.create(
model=MODEL_NAME,
messages=[{"role": "user", "content": "Who are you?"}],
temperature=0,
max_tokens=100,
)
Here's a partial error from server-end:
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'name'\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "<unknown>", line 27, in template\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] raise rewrite_traceback_stack(source=source)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] self.handle_exception(source=source_hint)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 768, in compile\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] jinja2.exceptions.TemplateSyntaxError: expected token 'end of print statement', got 'name'\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "<unknown>", line 27, in template\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] raise rewrite_traceback_stack(source=source)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 939, in handle_exception\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] self.handle_exception(source=source_hint)\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
[error]
ERROR 11-11 21:14:55 serving_chat.py:158] File "/usr/local/lib/python3.10/dist-packages/jinja2/environment.py", line 768, in compile\n
2024-11-11 16:14:55.477
[q3ubsnv48i2ucs]
There isn't any reported error on the Qwen Github regarding the chat template (it uses the SAME template as a model that was released months ago), so i suspect this is a runpod specific error?
Want results from more Discord servers?
Add your server