Rayboy Posts - Answer Overflow

Rayboy

•Created by Rayboy on 8/12/2024 in #⚡｜serverless

Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?

The standard endpoint provides executionTime as well as an ID that points to an execution that I can use /status on:

{
  "delayTime": 598,
  "executionTime": 1276,
  "id": "84407cd5-63c4-45d6-aa56-b1f136c44d14-u1",
  "output": ...
}

{
  "delayTime": 598,
  "executionTime": 1276,
  "id": "84407cd5-63c4-45d6-aa56-b1f136c44d14-u1",
  "output": ...
}

The OpenAI API endpoints unfortunately do not provide this, only token usage and a "chat-" ID that maybe I can do something with, but I can not find any documentation on:

{
    "choices": ...
    "created": 1723501967,
    "id": "chat-652d581fee6c4bffb771c43b371b444e",
    "model": ...,
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 100,
        "prompt_tokens": 17,
        "total_tokens": 117
    }
}

{
    "choices": ...
    "created": 1723501967,
    "id": "chat-652d581fee6c4bffb771c43b371b444e",
    "model": ...,
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 100,
        "prompt_tokens": 17,
        "total_tokens": 117
    }
}

Any help would be appreciated!

10 replies

Gaming

Programming