RunPod•5mo ago

Consistently timing out after 90 seconds

I'm not exactly sure why this is happening, and I don't think this happened earlier, but currently I'm consistently seeing requests timeout after 90 seconds. Max. execution time is set to 300 seconds, so this shouldn't be the issue. Is this a known problem?

25 Replies

Jason•5mo ago

Is this vllm-worker? How are you sending requests? It might be because cloudflare proxy that limits connection to max 100s but not sure

nikolaiOP•5mo ago

Thanks for the swift response 🙂 Yes, it's the vllm worker. I'm hitting the OpenAI endpoint. In this case, it probably is the Cloudflare Proxy, yes. Is there any way to circumvent it?

nikolaiOP•5mo ago

Because I require tool calls, it's not possible for me to use the async API, which can be only exposed through the OpenAI endpoint. :/ https://github.com/runpod-workers/worker-vllm/blob/main/src/handler.py#L11

GitHub

worker-vllm/src/handler.py at main · runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Poddy•5mo ago

@nikolai

Escalated To Zendesk

The thread has been escalated to Zendesk!

Jason•5mo ago

Maybe give your endpoint ID for staff to check But does the response stream tho? Like before that 90seconds

flash-singh•5mo ago

is that 90s when you run into cold start?

nikolaiOP•5mo ago

No, wasn't cold starts, it was with hot instances. It definitely was the Cloudflare Proxy. If anybody is facing this problem in the future: Support suggested switching to an TCP port. Haven't tried it as I'll be moving away from Serverless.

Jason•5mo ago

Can you ask back to the support, what if youre using the openai client, how can you use tcp port for it

flash-singh•5mo ago

we plan to increase serverless timeouts from 90s to 300s, 5mins, in jan @nikolai that will fix your chat completions issue if it can be done in 5mins

Jason•5mo ago

Wait what timeouts? The default value for the execution timeout?

flash-singh•5mo ago

max timeout for http for serverless is 100s, to be safe we set it to 90s in serverless, e.g. runsync waits upto 90s, with new changes you can wait up to 5mins this has nothing to do with execution timeout

Jason•5mo ago

Ohh I see okok

Tomi•3mo ago

Hey, how are you? Just checking in to see if there’s any update on this. We’ve been running into issues with the timeout

flash-singh•3mo ago

can you share more details, timeout with which path? openai? whats timeout time?

Tomi•3mo ago

sure! we’ve been experiencing issues with the timeout mentioned in this thread. the Runpod response stops after 1 minute while using streaming for the response, and in some cases, I don’t get the complete response for the request.

flash-singh•3mo ago

can you please give me more details, we have increased the timeout to 5 mins so need to understand how your using it to find the gap, are you using a lib? which path are you calling?

Tomi•3mo ago

We’re using the OpenAI path https://api.runpod.ai/v2/:pod_id/openai/v1/chat/completions, but the connection keeps closing after a minute. We tried deploying a new worker today in case the changes hadn’t been applied, but we’re still facing the same issue. I’m attaching a screenshot of a Postman request. Is there any new configuration we should consider when deploying a new worker?

Tomi•3mo ago

flash-singh•3mo ago

thanks ill take a look and share feedback i have fixed bug in dev, will plan release by early next week, can you set stream to false and test if that works?

Tomi•3mo ago

yess, with stream=false, it seems to work correctly. we'll wait for the fix you mentioned. Thank you very much

SvenBrnn•3mo ago

Do we get a information when this is live? It also happens on my own worker when i enable serverless, i was thinking that i'm doing something still wrong on my side but if this happens for vllm too i meight just want to wait till your fix is live to see if my code works then two

flash-singh•3mo ago

what path are you trying? it should only happen for openai stream

SvenBrnn•3mo ago

it is the openai stream 🙂 i've build myself a ollama worker thats working with openai endpoint and open-webui to call it, currently i just disabled streaming for now to have it work as with streaming it will stop writing after about 1 minute

Tomi•3mo ago

hey how are you? yesterday we were testing our worker and we finally got a succesful response with stream after 1m53s. Many thanks @flash-singh

flash-singh•3mo ago

i was trying to find this thread, yes the new changes are out, now you get up to 5mins

Gaming

Programming

Consistently timing out after 90 seconds

Did you find this page helpful?