R
RunPod•5d ago
nikolai

Consistently timing out after 90 seconds

I'm not exactly sure why this is happening, and I don't think this happened earlier, but currently I'm consistently seeing requests timeout after 90 seconds. Max. execution time is set to 300 seconds, so this shouldn't be the issue. Is this a known problem?
No description
8 Replies
nerdylive
nerdylive•5d ago
Is this vllm-worker? How are you sending requests? It might be because cloudflare proxy that limits connection to max 100s but not sure
nikolai
nikolaiOP•5d ago
Thanks for the swift response 🙂 Yes, it's the vllm worker. I'm hitting the OpenAI endpoint. In this case, it probably is the Cloudflare Proxy, yes. Is there any way to circumvent it?
nikolai
nikolaiOP•5d ago
Because I require tool calls, it's not possible for me to use the async API, which can be only exposed through the OpenAI endpoint. :/ https://github.com/runpod-workers/worker-vllm/blob/main/src/handler.py#L11
GitHub
worker-vllm/src/handler.py at main · runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Poddy
Poddy•4d ago
@nikolai
Escalated To Zendesk
The thread has been escalated to Zendesk!
nerdylive
nerdylive•4d ago
Maybe give your endpoint ID for staff to check But does the response stream tho? Like before that 90seconds
flash-singh
flash-singh•4d ago
is that 90s when you run into cold start?
nikolai
nikolaiOP•3d ago
No, wasn't cold starts, it was with hot instances. It definitely was the Cloudflare Proxy. If anybody is facing this problem in the future: Support suggested switching to an TCP port. Haven't tried it as I'll be moving away from Serverless.
nerdylive
nerdylive•3d ago
Can you ask back to the support, what if youre using the openai client, how can you use tcp port for it
Want results from more Discord servers?
Add your server