Misterion
RRunPod
•Created by Misterion on 12/11/2024 in #⚡|serverless
vllm worker OpenAI stream timeout
OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible.
I'm hosting 70B model, which usualy has ~2 mins delay for request.
Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?
20 replies