Misterion
Misterion
RRunPod
Created by Misterion on 12/11/2024 in #⚡|serverless
vllm worker OpenAI stream timeout
OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?
20 replies