R
RunPod10mo ago
Casper.

SGLang worker (similar to worker-vllm)

Recently, some progress has been made for efficiently deploying LLMs and LMMs. SGLang is up to 5x faster than vLLM. @Alpay Ariyak could we port the worker-vllm setup to SGLang? https://github.com/sgl-project/sglang https://lmsys.org/blog/2024-01-17-sglang/
10 Replies
Alpay Ariyak
Alpay Ariyak10mo ago
Hey @Casper., thanks for sharing this. What do you mean by porting the worker-vllm setup? Creating an SGLang worker in general or making it match the vllm worker?
Casper.
Casper.OP10mo ago
I mean creating a similar setup for SGLang
Alpay Ariyak
Alpay Ariyak10mo ago
Potentially in the near future We’re still not where we want to be with the vLLM worker and our inference offering in terms of features and developer experience, so I’d say another LLM inference engine worker likely won’t be a priority for now. In regards to speed, vLLM receives regular improvements, but this might be worth exploring for the LMM aspect
Casper.
Casper.OP10mo ago
The thing is that vLLM is currently falling behind other inference frameworks. It does receive regular updates, but the problem is that other teams have much more time dedicated to optimizing their solutions, so they end up outcompeting vLLM in speed
thanatos121.
thanatos121.8mo ago
@Alpay Ariyak This would be a top priority for me as well. Currently, LLaVa multimodal models aren't supported by vLLM or TGI so SGLang seems like the only easy way to deploy. I will hack something together in the meantime but having a natively supported worker would be appreciated.
thanatos121.
thanatos121.8mo ago
For reference, it does look like there might be an existing solution with "LMscript" but I am unfamiliar with it so not sure how well it will work. An official worker sanctioned by runpod would be preferable in my opinion: https://github.com/sgl-project/sglang/issues/274
GitHub
Cannot Execute Runtime Directly in Docker, with local install · Iss...
I'm running the runtime directly, like so: SGLANG_PORT, additional_ports = handle_port_init(30000, None, 1) RUNTIME = sgl.Runtime( model_path=model_path, port=SGLANG_PORT, additional_ports=addi...
Alpay Ariyak
Alpay Ariyak8mo ago
vLLM just released an update with llava support, updating the worker
JJonahJ
JJonahJ8mo ago
Sorry to hijack, but if it’s not too much trouble, it’d be nice to have the option to use locally stored models when baking them into the docker image. For times when huggingface is down, for example…
thanatos121.
thanatos121.6mo ago
@Alpay Ariyak The vLLM support does not support LLaVa Next 1.6 as it doesn't support the multiple image sizes. SGlang is still the only platform that is officially supported
nerdylive
nerdylive6mo ago
as an alternative? write is as an issue on github, so they notice it more haha
Want results from more Discord servers?
Add your server