R
RunPod12mo ago
Volko

is AWQ faster than GGUF ?

In which order is the fastest inference speed between AWQ, GGUF, GPTQ, QAT, EXL2 ?
5 Replies
digigoblin
digigoblin12mo ago
EXL2
Volko
VolkoOP12mo ago
Okay thanks
digigoblin
digigoblin12mo ago
Thats the fastest, don't know about the others, never actually used QAT or GGUF. @aikitoria will probably know.
aikitoria
aikitoria12mo ago
I've not used AWQ or GPTQ directly, those are older formats you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only. it's for hybrid CPU/GPU inference you use EXL2 for maximum speed on a single GPU you use aphrodite-engine or TensorRT-LLM (good luck!) for maximum speed on multiple GPUs
Geri
Geri10mo ago
hi do you use tensorrt llm?

Did you find this page helpful?