Misterion Posts - Answer Overflow

Misterion

•Created by Misterion on 12/11/2024 in #⚡｜serverless

vllm worker OpenAI stream timeout

OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?

20 replies