Consistently timing out after 90 seconds
I'm not exactly sure why this is happening, and I don't think this happened earlier, but currently I'm consistently seeing requests timeout after 90 seconds.
Max. execution time is set to 300 seconds, so this shouldn't be the issue.
Is this a known problem?
8 Replies
Is this vllm-worker?
How are you sending requests?
It might be because cloudflare proxy that limits connection to max 100s but not sure
Thanks for the swift response 🙂
Yes, it's the vllm worker. I'm hitting the OpenAI endpoint. In this case, it probably is the Cloudflare Proxy, yes.
Is there any way to circumvent it?
Because I require tool calls, it's not possible for me to use the async API, which can be only exposed through the OpenAI endpoint. :/
https://github.com/runpod-workers/worker-vllm/blob/main/src/handler.py#L11
GitHub
worker-vllm/src/handler.py at main · runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
@nikolai
Escalated To Zendesk
The thread has been escalated to Zendesk!
Maybe give your endpoint ID for staff to check
But does the response stream tho? Like before that 90seconds
is that 90s when you run into cold start?
No, wasn't cold starts, it was with hot instances. It definitely was the Cloudflare Proxy.
If anybody is facing this problem in the future: Support suggested switching to an TCP port. Haven't tried it as I'll be moving away from Serverless.
Can you ask back to the support, what if youre using the openai client, how can you use tcp port for it