Coderik
RRunPod
•Created by EMPZ on 9/12/2024 in #⚡|serverless
Very slow upload speeds from serverless workers
13 replies
RRunPod
•Created by EMPZ on 9/12/2024 in #⚡|serverless
Very slow upload speeds from serverless workers
Try to avoid EUR-IS-1 which is a datacenter in Island. You can measure upload/download speeds using a pod in this region and verify that the network to this datacenter is super slow.
13 replies
RRunPod
•Created by Coderik on 9/12/2024 in #⚡|serverless
TTL for vLLM endpoint
After digging, I think it cannot be done even by modifying the vllm worker's code. I've reached out to the support to clarify.
13 replies
RRunPod
•Created by EMPZ on 9/12/2024 in #⚡|serverless
Very slow upload speeds from serverless workers
In which region in Europe are your workers?
13 replies
RRunPod
•Created by Coderik on 9/12/2024 in #⚡|serverless
TTL for vLLM endpoint
Sure, I'm using the runpod's vLLM worker: https://github.com/runpod-workers/worker-vllm
13 replies
RRunPod
•Created by Coderik on 9/12/2024 in #⚡|serverless
TTL for vLLM endpoint
Yes, I'm getting 400 status and validation errors
13 replies
RRunPod
•Created by Coderik on 9/12/2024 in #⚡|serverless
TTL for vLLM endpoint
When I use
https://api.runpod.ai/v2/{ID}/openai/v1
endpoint, the OpenAI's input format is enforced, so I cannot pass policy
there. Based on the worker-vllm code, it seems that at some moment the (OpenAI-compatible) payload is wrapped in the input
field, so that the rest of the scheduling and handling can happen. I assume that the capability to handle TTL is there, I just cannot figure out, how to pass the config. Am I missing something?13 replies
RRunPod
•Created by Coderik on 9/12/2024 in #⚡|serverless
TTL for vLLM endpoint
There is
policy.ttl
parameter for regular tasks (https://docs.runpod.io/serverless/endpoints/send-requests#execution-policies), but not for OpenAI-compatible API powered by vLLM (https://github.com/runpod-workers/worker-vllm).13 replies
RRunPod
•Created by Coderik on 9/12/2024 in #⚡|serverless
TTL for vLLM endpoint
But this is execution timeout. The time spent waiting in the queue does not count as far as I can tell. What I'd like to achieve is discard a task that was sitting in the queue longer than it's TTL. In my case there is a timeout on the caller's side, so the response from such a task will not be received anyways.
13 replies