R
RunPod3mo ago
Coderik

TTL for vLLM endpoint

Is there a way to specify TTL value when calling a vLLM endpoint via OpenAI-compatible API?
11 Replies
Encyrption
Encyrption3mo ago
You can set a timeout value in the endpoint, like this.
No description
Coderik
CoderikOP3mo ago
But this is execution timeout. The time spent waiting in the queue does not count as far as I can tell. What I'd like to achieve is discard a task that was sitting in the queue longer than it's TTL. In my case there is a timeout on the caller's side, so the response from such a task will not be received anyways.
Encyrption
Encyrption3mo ago
I don't think you can timeout based upon time in QUEUE.
Coderik
CoderikOP3mo ago
There is policy.ttl parameter for regular tasks (https://docs.runpod.io/serverless/endpoints/send-requests#execution-policies), but not for OpenAI-compatible API powered by vLLM (https://github.com/runpod-workers/worker-vllm). When I use https://api.runpod.ai/v2/{ID}/openai/v1 endpoint, the OpenAI's input format is enforced, so I cannot pass policy there. Based on the worker-vllm code, it seems that at some moment the (OpenAI-compatible) payload is wrapped in the input field, so that the rest of the scheduling and handling can happen. I assume that the capability to handle TTL is there, I just cannot figure out, how to pass the config. Am I missing something?
Encyrption
Encyrption3mo ago
Have you tried posting the data you are looking to pass?
Coderik
CoderikOP3mo ago
Yes, I'm getting 400 status and validation errors
Encyrption
Encyrption3mo ago
Can you show the code for your handler?
Coderik
CoderikOP3mo ago
Sure, I'm using the runpod's vLLM worker: https://github.com/runpod-workers/worker-vllm
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Encyrption
Encyrption3mo ago
If you do not modify the source code you cannot pass any additional arguments.
Coderik
CoderikOP3mo ago
After digging, I think it cannot be done even by modifying the vllm worker's code. I've reached out to the support to clarify.
Madiator2011
Madiator20113mo ago
cant be done
Want results from more Discord servers?
Add your server