SGLang worker (similar to worker-vllm)
Recently, some progress has been made for efficiently deploying LLMs and LMMs. SGLang is up to 5x faster than vLLM. @Alpay Ariyak could we port the worker-vllm setup to SGLang?
https://github.com/sgl-project/sglang
https://lmsys.org/blog/2024-01-17-sglang/
10 Replies
Hey @Casper., thanks for sharing this. What do you mean by porting the worker-vllm setup? Creating an SGLang worker in general or making it match the vllm worker?
I mean creating a similar setup for SGLang
Potentially in the near future
We’re still not where we want to be with the vLLM worker and our inference offering in terms of features and developer experience, so I’d say another LLM inference engine worker likely won’t be a priority for now. In regards to speed, vLLM receives regular improvements, but this might be worth exploring for the LMM aspect
The thing is that vLLM is currently falling behind other inference frameworks. It does receive regular updates, but the problem is that other teams have much more time dedicated to optimizing their solutions, so they end up outcompeting vLLM in speed
@Alpay Ariyak This would be a top priority for me as well. Currently, LLaVa multimodal models aren't supported by vLLM or TGI so SGLang seems like the only easy way to deploy. I will hack something together in the meantime but having a natively supported worker would be appreciated.
For reference, it does look like there might be an existing solution with "LMscript" but I am unfamiliar with it so not sure how well it will work. An official worker sanctioned by runpod would be preferable in my opinion: https://github.com/sgl-project/sglang/issues/274
GitHub
Cannot Execute Runtime Directly in Docker, with local install · Iss...
I'm running the runtime directly, like so: SGLANG_PORT, additional_ports = handle_port_init(30000, None, 1) RUNTIME = sgl.Runtime( model_path=model_path, port=SGLANG_PORT, additional_ports=addi...
vLLM just released an update with llava support, updating the worker
Sorry to hijack, but if it’s not too much trouble, it’d be nice to have the option to use locally stored models when baking them into the docker image. For times when huggingface is down, for example…
@Alpay Ariyak The vLLM support does not support LLaVa Next 1.6 as it doesn't support the multiple image sizes. SGlang is still the only platform that is officially supported
as an alternative? write is as an issue on github, so they notice it more haha