Settings to reduce delay time using sglang for 4bit quantized models?

I'm deploying 4bit AWQ quantized model: casperhansen/llama-3.3-70b-instruct-awq The delay time for parallel requests increases exponentially when using tsglang template. What settings I need to use to make sure the delay time is manageable?
1 Reply
flash-singh
flash-singh4w ago
which git repo or template are you using? can you shre link?

Did you find this page helpful?