maywell Posts - Answer Overflow

maywell

•Created by maywell on 2/19/2024 in #⚡｜serverless

Any plans to add other inference engine?

Hi I'm using vllm worker now but when it comes to quantized models vllm works poorly. Too many vram usage, slow inference, poor output quality, etc.. So, is there any plans to add other engines like tgi, exl2?

3 replies

Gaming

Programming