Can anyone help me deploy a qwen/qwq-32B-Preview model from huggingface with vllm serverless
I'm having issues with configurations.
I used 1 gpu of 80gb with
container image : runpod/worker-v1-vllm:stable-cuda12.1.0.
and had set the dtype as bfloat16,
but the model is giving rubbish outputs.
0 Replies