RunPod•8mo ago

Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?

The standard endpoint provides executionTime as well as an ID that points to an execution that I can use /status on:

{
  "delayTime": 598,
  "executionTime": 1276,
  "id": "84407cd5-63c4-45d6-aa56-b1f136c44d14-u1",
  "output": ...
}

{
  "delayTime": 598,
  "executionTime": 1276,
  "id": "84407cd5-63c4-45d6-aa56-b1f136c44d14-u1",
  "output": ...
}

The OpenAI API endpoints unfortunately do not provide this, only token usage and a "chat-" ID that maybe I can do something with, but I can not find any documentation on:

{
    "choices": ...
    "created": 1723501967,
    "id": "chat-652d581fee6c4bffb771c43b371b444e",
    "model": ...,
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 100,
        "prompt_tokens": 17,
        "total_tokens": 117
    }
}

{
    "choices": ...
    "created": 1723501967,
    "id": "chat-652d581fee6c4bffb771c43b371b444e",
    "model": ...,
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 100,
        "prompt_tokens": 17,
        "total_tokens": 117
    }
}

Any help would be appreciated!

5 Replies

yhlong00000•8mo ago

you can call https://api.runpod.ai/v2/endpoint_id/status/job_id to get execution time

RayboyOP•8mo ago

Yes as I mentioned I tried this, but the OpenAI API endpoints do not return a job ID for me to use. It only returns "chat-652d581fee6c4bffb771c43b371b444e" which does not seem to be a job ID I would use the Standard endpoints as they do return a job ID but I need to use the guided_json field and it seems only the OpenAI endpoints support that

nerdylive•8mo ago

ah ya it doesn't.. also 1 worker can process multiple request at the same time in vllm-worker

RayboyOP•8mo ago

That is a bummer, I really need to be able to calculate our costs per user when making requests and the total_tokens can't help me with that unfortunately. Its good that one 1 worker can process multiple, though that makes it hard to calculate costs this way since it would actually be cheaper if multiple ran requests on the same active worker. I may need to figure out another way to calculate it then. Would it be possible in a future update to get the job ID sent back in the OpenAI endpoints? I also would like to be able to cancel the job from our server but I currently cannot.

yhlong00000•8mo ago

Thanks for the feedback, I noted internally~

Gaming

Programming

Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?

Did you find this page helpful?