R
RunPod9mo ago
Volko

is AWQ faster than GGUF ?

In which order is the fastest inference speed between AWQ, GGUF, GPTQ, QAT, EXL2 ?
5 Replies
digigoblin
digigoblin9mo ago
EXL2
Volko
VolkoOP9mo ago
Okay thanks
digigoblin
digigoblin9mo ago
Thats the fastest, don't know about the others, never actually used QAT or GGUF. @aikitoria will probably know.
aikitoria
aikitoria9mo ago
I've not used AWQ or GPTQ directly, those are older formats you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only. it's for hybrid CPU/GPU inference you use EXL2 for maximum speed on a single GPU you use aphrodite-engine or TensorRT-LLM (good luck!) for maximum speed on multiple GPUs
Geri
Geri6mo ago
hi do you use tensorrt llm?
Want results from more Discord servers?
Add your server