vllm worker OpenAI stream timeout

OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?
11 Replies
wiki
wiki2mo ago
Did you set model name ? Or it was as it is MODEL_NAME?
Misterion
MisterionOP2mo ago
MODEL_NAME is huggingface link as usual basically what I experience there is that server closes the connection after ~ 1 min in case stream == True, non-streaming works fine
nerdylive
nerdylive2mo ago
Eh isn't it the model repo only like meta-llama/llama3.3-70b something like that
Misterion
MisterionOP2mo ago
yes this is what I meant, sorry I'm not sure how does MODEL_NAME affect this problem at all
nerdylive
nerdylive2mo ago
Maybe just the environment variable key name Maybe was only checking for that But if not using stream does it works?
Misterion
MisterionOP2mo ago
Yes, this waits for the whole request to finish.
client = OpenAI(
base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
messages=[
{
"role": "user",
"content": "Say hello!",
},
],
)
client = OpenAI(
base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
messages=[
{
"role": "user",
"content": "Say hello!",
},
],
)
Adding stream=True, sends the request which I can see in the dashboard, but it terminates the connection after ~1 min.
nerdylive
nerdylive2mo ago
Oh hmm And empty response? Nothing streamed back? If you replicate your vllm config in a pod, try it if it works with streaming and try active workers too I'm guessing it might be the cloudflare proxy limiting a request to a 100s only
Misterion
MisterionOP2mo ago
Nope
nerdylive
nerdylive2mo ago
If you want you can create a ticket too to explore more on this
Poddy
Poddy2mo ago
@Misterion
Escalated To Zendesk
The thread has been escalated to Zendesk!
Justin
Justin2mo ago
Same issue here but even without streaming

Did you find this page helpful?