RunPod•5mo ago

vllm worker OpenAI stream timeout

OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?

11 Replies

wiki•5mo ago

Did you set model name ? Or it was as it is MODEL_NAME?

MisterionOP•5mo ago

MODEL_NAME is huggingface link as usual basically what I experience there is that server closes the connection after ~ 1 min in case stream == True, non-streaming works fine

Jason•5mo ago

Eh isn't it the model repo only like meta-llama/llama3.3-70b something like that

MisterionOP•5mo ago

yes this is what I meant, sorry I'm not sure how does MODEL_NAME affect this problem at all

Jason•5mo ago

Maybe just the environment variable key name Maybe was only checking for that But if not using stream does it works?

MisterionOP•5mo ago

Yes, this waits for the whole request to finish.

client = OpenAI(
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
    model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": "Say hello!",
        },
    ],
)

client = OpenAI(
    base_url=f"https://api.runpod.ai/v2/{endpoint_id}/openai/v1", api_key=api_key
)

stream = client.chat.completions.create(
    model="neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8",
    messages=[
        {
            "role": "user",
            "content": "Say hello!",
        },
    ],
)

Adding stream=True, sends the request which I can see in the dashboard, but it terminates the connection after ~1 min.

Jason•5mo ago

Oh hmm And empty response? Nothing streamed back? If you replicate your vllm config in a pod, try it if it works with streaming and try active workers too I'm guessing it might be the cloudflare proxy limiting a request to a 100s only

MisterionOP•5mo ago

Nope

Jason•5mo ago

If you want you can create a ticket too to explore more on this

Poddy•5mo ago

@Misterion

Escalated To Zendesk

The thread has been escalated to Zendesk!

Justin•5mo ago

Same issue here but even without streaming

Gaming

Programming

vllm worker OpenAI stream timeout

Did you find this page helpful?