is AWQ faster than GGUF ?
In which order is the fastest inference speed between AWQ, GGUF, GPTQ, QAT, EXL2 ?
5 Replies
EXL2
Okay thanks
Thats the fastest, don't know about the others, never actually used QAT or GGUF.
@aikitoria will probably know.
I've not used AWQ or GPTQ directly, those are older formats
you use GGUF if you want to run on a very small GPU and have to keep some of the model on CPU only. it's for hybrid CPU/GPU inference
you use EXL2 for maximum speed on a single GPU
you use aphrodite-engine or TensorRT-LLM (good luck!) for maximum speed on multiple GPUs
hi do you use tensorrt llm?