Rayboy
RRunPod
•Created by Rayboy on 8/12/2024 in #⚡|serverless
Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?
Would it be possible in a future update to get the job ID sent back in the OpenAI endpoints? I also would like to be able to cancel the job from our server but I currently cannot.
10 replies
RRunPod
•Created by Rayboy on 8/12/2024 in #⚡|serverless
Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?
Its good that one 1 worker can process multiple, though that makes it hard to calculate costs this way since it would actually be cheaper if multiple ran requests on the same active worker. I may need to figure out another way to calculate it then.
10 replies
RRunPod
•Created by Rayboy on 8/12/2024 in #⚡|serverless
Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?
That is a bummer, I really need to be able to calculate our costs per user when making requests and the total_tokens can't help me with that unfortunately.
10 replies
RRunPod
•Created by Rayboy on 8/12/2024 in #⚡|serverless
Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?
I would use the Standard endpoints as they do return a job ID but I need to use the guided_json field and it seems only the OpenAI endpoints support that
10 replies
RRunPod
•Created by Rayboy on 8/12/2024 in #⚡|serverless
Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?
Yes as I mentioned I tried this, but the OpenAI API endpoints do not return a job ID for me to use. It only returns "chat-652d581fee6c4bffb771c43b371b444e" which does not seem to be a job ID
10 replies