maywell
RRunPod
•Created by maywell on 2/19/2024 in #⚡|serverless
Any plans to add other inference engine?
Hi I'm using vllm worker now but when it comes to quantized models vllm works poorly. Too many vram usage, slow inference, poor output quality, etc..
So, is there any plans to add other engines like tgi, exl2?
3 replies