Bryan Posts - Answer Overflow

Bryan

•Created by Bryan on 5/11/2024 in #⚡｜serverless

Output guidance with vLLM Host on RunPod

Greetings! I've been using vLLM on my homelab servers for a while and I'm looking to add the ability to scale my application using RunPod. On my locally hosted vLLM instances, I use output guidance via the "outlines" guided decoder to constrain LLM output to specified Json Schemas or Regex. One question I haven't been able to find an answer to: Does RunPod support this functionality with serverless vLLM hosting in the OpenAI API? (I assume it supports it with pods if you set up your own instance of vLLM) It's looking like the answer is no, but I'm hopeful the answer is "yes" as I'd really like to take advantage of the benefits of serverless hosting AND guided output. Appreciate any help or insight you can provide. Thanks in advance, cheers.

20 replies

Gaming

Programming