R
RunPod•3mo ago
nikolai

Consistently timing out after 90 seconds

I'm not exactly sure why this is happening, and I don't think this happened earlier, but currently I'm consistently seeing requests timeout after 90 seconds. Max. execution time is set to 300 seconds, so this shouldn't be the issue. Is this a known problem?
No description
25 Replies
nerdylive
nerdylive•3mo ago
Is this vllm-worker? How are you sending requests? It might be because cloudflare proxy that limits connection to max 100s but not sure
nikolai
nikolaiOP•3mo ago
Thanks for the swift response 🙂 Yes, it's the vllm worker. I'm hitting the OpenAI endpoint. In this case, it probably is the Cloudflare Proxy, yes. Is there any way to circumvent it?
nikolai
nikolaiOP•3mo ago
Because I require tool calls, it's not possible for me to use the async API, which can be only exposed through the OpenAI endpoint. :/ https://github.com/runpod-workers/worker-vllm/blob/main/src/handler.py#L11
GitHub
worker-vllm/src/handler.py at main · runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Poddy
Poddy•3mo ago
@nikolai
Escalated To Zendesk
The thread has been escalated to Zendesk!
nerdylive
nerdylive•3mo ago
Maybe give your endpoint ID for staff to check But does the response stream tho? Like before that 90seconds
flash-singh
flash-singh•3mo ago
is that 90s when you run into cold start?
nikolai
nikolaiOP•3mo ago
No, wasn't cold starts, it was with hot instances. It definitely was the Cloudflare Proxy. If anybody is facing this problem in the future: Support suggested switching to an TCP port. Haven't tried it as I'll be moving away from Serverless.
nerdylive
nerdylive•3mo ago
Can you ask back to the support, what if youre using the openai client, how can you use tcp port for it
flash-singh
flash-singh•3mo ago
we plan to increase serverless timeouts from 90s to 300s, 5mins, in jan @nikolai that will fix your chat completions issue if it can be done in 5mins
nerdylive
nerdylive•3mo ago
Wait what timeouts? The default value for the execution timeout?
flash-singh
flash-singh•3mo ago
max timeout for http for serverless is 100s, to be safe we set it to 90s in serverless, e.g. runsync waits upto 90s, with new changes you can wait up to 5mins this has nothing to do with execution timeout
nerdylive
nerdylive•3mo ago
Ohh I see okok
Tomi
Tomi•2mo ago
Hey, how are you? Just checking in to see if there’s any update on this. We’ve been running into issues with the timeout
flash-singh
flash-singh•2mo ago
can you share more details, timeout with which path? openai? whats timeout time?
Tomi
Tomi•2mo ago
sure! we’ve been experiencing issues with the timeout mentioned in this thread. the Runpod response stops after 1 minute while using streaming for the response, and in some cases, I don’t get the complete response for the request.
flash-singh
flash-singh•2mo ago
can you please give me more details, we have increased the timeout to 5 mins so need to understand how your using it to find the gap, are you using a lib? which path are you calling?
Tomi
Tomi•2mo ago
We’re using the OpenAI path https://api.runpod.ai/v2/:pod_id/openai/v1/chat/completions, but the connection keeps closing after a minute. We tried deploying a new worker today in case the changes hadn’t been applied, but we’re still facing the same issue. I’m attaching a screenshot of a Postman request. Is there any new configuration we should consider when deploying a new worker?
Tomi
Tomi•2mo ago
No description
flash-singh
flash-singh•2mo ago
thanks ill take a look and share feedback i have fixed bug in dev, will plan release by early next week, can you set stream to false and test if that works?
Tomi
Tomi•2mo ago
yess, with stream=false, it seems to work correctly. we'll wait for the fix you mentioned. Thank you very much
No description
SvenBrnn
SvenBrnn•2mo ago
Do we get a information when this is live? It also happens on my own worker when i enable serverless, i was thinking that i'm doing something still wrong on my side but if this happens for vllm too i meight just want to wait till your fix is live to see if my code works then two
flash-singh
flash-singh•2mo ago
what path are you trying? it should only happen for openai stream
SvenBrnn
SvenBrnn•2mo ago
it is the openai stream 🙂 i've build myself a ollama worker thats working with openai endpoint and open-webui to call it, currently i just disabled streaming for now to have it work as with streaming it will stop writing after about 1 minute
Tomi
Tomi•2mo ago
hey how are you? yesterday we were testing our worker and we finally got a succesful response with stream after 1m53s. Many thanks @flash-singh
No description
flash-singh
flash-singh•2mo ago
i was trying to find this thread, yes the new changes are out, now you get up to 5mins

Did you find this page helpful?