Can anyone help me deploy a qwen/qwq-32B-Preview model from huggingface with vllm serverless

I'm having issues with configurations. I used 1 gpu of 80gb with container image : runpod/worker-v1-vllm:stable-cuda12.1.0. and had set the dtype as bfloat16, but the model is giving rubbish outputs.

0 Replies

No replies yetBe the first to reply to this messageJoin

RunPod Join

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

17KMembers

View on Discord

Gaming

Programming

Can anyone help me deploy a qwen/qwq-32B-Preview model from huggingface with vllm serverless

Did you find this page helpful?