R
RunPod10mo ago
maywell

Any plans to add other inference engine?

Hi I'm using vllm worker now but when it comes to quantized models vllm works poorly. Too many vram usage, slow inference, poor output quality, etc.. So, is there any plans to add other engines like tgi, exl2?
1 Reply
Alpay Ariyak
Alpay Ariyak10mo ago
Potentially in the future, currently not a priority in the short term
Want results from more Discord servers?
Add your server