RunPod•10mo ago

update worker-vllm to vllm 0.5.0

vLLM just got bumped to 0.5.0 with significant features being ready for production. @Alpay Ariyak FP8 is very significant but so is speculative decoding and prefix caching. - FP8 support is ready for testing. By quantizing the portion model weights to 8 bit precision float point, the inference speed gets 1.5x boost. - Add OpenAI Vision API support. Currently only LLaVA and LLaVA-NeXT are supported. - Speculative Decoding and Automatic Prefix Caching is also ready for testing, we plan to turn them on by default in upcoming releases.

Solution:

For sure, already in progress!

Jump to solution

2 Replies

Solution

Alpay Ariyak•10mo ago

For sure, already in progress!

Casper.OP•10mo ago

Nice to hear it's already in progress! Let me know when it's ready and I would love to test it out

Gaming

Programming

update worker-vllm to vllm 0.5.0

Did you find this page helpful?