Consistently timing out after 90 seconds
I'm not exactly sure why this is happening, and I don't think this happened earlier, but currently I'm consistently seeing requests timeout after 90 seconds.
Max. execution time is set to 300 seconds, so this shouldn't be the issue.
Is this a known problem?

25 Replies
Is this vllm-worker?
How are you sending requests?
It might be because cloudflare proxy that limits connection to max 100s but not sure
Thanks for the swift response 🙂
Yes, it's the vllm worker. I'm hitting the OpenAI endpoint. In this case, it probably is the Cloudflare Proxy, yes.
Is there any way to circumvent it?
Because I require tool calls, it's not possible for me to use the async API, which can be only exposed through the OpenAI endpoint. :/
https://github.com/runpod-workers/worker-vllm/blob/main/src/handler.py#L11
GitHub
worker-vllm/src/handler.py at main · runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
@nikolai
Escalated To Zendesk
The thread has been escalated to Zendesk!
Maybe give your endpoint ID for staff to check
But does the response stream tho? Like before that 90seconds
is that 90s when you run into cold start?
No, wasn't cold starts, it was with hot instances. It definitely was the Cloudflare Proxy.
If anybody is facing this problem in the future: Support suggested switching to an TCP port. Haven't tried it as I'll be moving away from Serverless.
Can you ask back to the support, what if youre using the openai client, how can you use tcp port for it
we plan to increase serverless timeouts from 90s to 300s, 5mins, in jan
@nikolai that will fix your chat completions issue if it can be done in 5mins
Wait what timeouts? The default value for the execution timeout?
max timeout for http for serverless is 100s, to be safe we set it to 90s in serverless, e.g. runsync waits upto 90s, with new changes you can wait up to 5mins
this has nothing to do with execution timeout
Ohh I see okok
Hey, how are you? Just checking in to see if there’s any update on this. We’ve been running into issues with the timeout
can you share more details, timeout with which path? openai? whats timeout time?
sure! we’ve been experiencing issues with the timeout mentioned in this thread. the Runpod response stops after 1 minute while using streaming for the response, and in some cases, I don’t get the complete response for the request.
can you please give me more details, we have increased the timeout to 5 mins so need to understand how your using it to find the gap, are you using a lib? which path are you calling?
We’re using the OpenAI path https://api.runpod.ai/v2/:pod_id/openai/v1/chat/completions, but the connection keeps closing after a minute. We tried deploying a new worker today in case the changes hadn’t been applied, but we’re still facing the same issue. I’m attaching a screenshot of a Postman request. Is there any new configuration we should consider when deploying a new worker?

thanks ill take a look and share feedback
i have fixed bug in dev, will plan release by early next week, can you set stream to false and test if that works?
yess, with stream=false, it seems to work correctly. we'll wait for the fix you mentioned. Thank you very much

Do we get a information when this is live? It also happens on my own worker when i enable serverless, i was thinking that i'm doing something still wrong on my side but if this happens for vllm too i meight just want to wait till your fix is live to see if my code works then two
what path are you trying? it should only happen for openai stream
it is the openai stream 🙂 i've build myself a ollama worker thats working with openai endpoint and open-webui to call it, currently i just disabled streaming for now to have it work as with streaming it will stop writing after about 1 minute
hey how are you? yesterday we were testing our worker and we finally got a succesful response with stream after 1m53s. Many thanks @flash-singh

i was trying to find this thread, yes the new changes are out, now you get up to 5mins