RunPod•12mo ago

is AWQ faster than GGUF ?

In which order is the fastest inference speed between AWQ, GGUF, GPTQ, QAT, EXL2 ?

5 Replies

digigoblin•12mo ago

EXL2

VolkoOP•12mo ago

Okay thanks

digigoblin•12mo ago

Thats the fastest, don't know about the others, never actually used QAT or GGUF. @aikitoria will probably know.

aikitoria•12mo ago

I've not used AWQ or GPTQ directly, those are older formats you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only. it's for hybrid CPU/GPU inference you use EXL2 for maximum speed on a single GPU you use aphrodite-engine or TensorRT-LLM (good luck!) for maximum speed on multiple GPUs

Geri•10mo ago

hi do you use tensorrt llm?

Gaming

Programming

is AWQ faster than GGUF ?

Did you find this page helpful?