RunPod•15mo ago

SGLang worker (similar to worker-vllm)

Recently, some progress has been made for efficiently deploying LLMs and LMMs. SGLang is up to 5x faster than vLLM. @Alpay Ariyak could we port the worker-vllm setup to SGLang? https://github.com/sgl-project/sglang https://lmsys.org/blog/2024-01-17-sglang/

10 Replies

Alpay Ariyak•15mo ago

Hey @Casper., thanks for sharing this. What do you mean by porting the worker-vllm setup? Creating an SGLang worker in general or making it match the vllm worker?

Casper.OP•15mo ago

I mean creating a similar setup for SGLang

Alpay Ariyak•15mo ago

Potentially in the near future We’re still not where we want to be with the vLLM worker and our inference offering in terms of features and developer experience, so I’d say another LLM inference engine worker likely won’t be a priority for now. In regards to speed, vLLM receives regular improvements, but this might be worth exploring for the LMM aspect

Casper.OP•15mo ago

The thing is that vLLM is currently falling behind other inference frameworks. It does receive regular updates, but the problem is that other teams have much more time dedicated to optimizing their solutions, so they end up outcompeting vLLM in speed

thanatos121.•14mo ago

@Alpay Ariyak This would be a top priority for me as well. Currently, LLaVa multimodal models aren't supported by vLLM or TGI so SGLang seems like the only easy way to deploy. I will hack something together in the meantime but having a natively supported worker would be appreciated.

thanatos121.•14mo ago

For reference, it does look like there might be an existing solution with "LMscript" but I am unfamiliar with it so not sure how well it will work. An official worker sanctioned by runpod would be preferable in my opinion: https://github.com/sgl-project/sglang/issues/274

GitHub

Cannot Execute Runtime Directly in Docker, with local install · Iss...

I'm running the runtime directly, like so: SGLANG_PORT, additional_ports = handle_port_init(30000, None, 1) RUNTIME = sgl.Runtime( model_path=model_path, port=SGLANG_PORT, additional_ports=addi...

Alpay Ariyak•13mo ago

vLLM just released an update with llava support, updating the worker

Toxibunny•13mo ago

Sorry to hijack, but if it’s not too much trouble, it’d be nice to have the option to use locally stored models when baking them into the docker image. For times when huggingface is down, for example…

thanatos121.•11mo ago

@Alpay Ariyak The vLLM support does not support LLaVa Next 1.6 as it doesn't support the multiple image sizes. SGlang is still the only platform that is officially supported

Jason•11mo ago

as an alternative? write is as an issue on github, so they notice it more haha

Gaming

Programming

SGLang worker (similar to worker-vllm)

Did you find this page helpful?