OpenAI Serverless Endpoint Docs
Hello. From what I could find in the support threads here, you should be able to make a standard openAI request not wrapped in the "input" param if you hit your endpoint at https://api.runpod.ai/v2/<ENDPOINT ID>/openai/...
The handler should then receive two new params, "openai_route" and "openai_input," but it's been a couple of months since the threads, and I can't find any official docs about this or the ability to test this locally with the RunPod lib. Can someone please confirm that this works in custom images too? If so, what is the structure of the parameters received? Does "input" in
handler(input)
contain "openai_input" and "openai_route" params directly? Is there any way I can develop this locally?32 Replies
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
This read me has some examples
Yes. I've read through that source code before asking. Based on that, it should be how I wrote it. But someone also mentioned it might be allowed just for that official worker, so I wanted to make sure the /openai path also works for custom images before I write the whole code around it.
Does it also mean we can send any raw "non-wrapped" payload to that endpoint even when it's not openai related? It should pass any content to the "openai_input" and any route after to the "openai_route", right? Having a setting or another documented endpoint on serverless allowing us to send raw payloads would solve such problems with predefined APIs.
Did you figure this out?
it's still not clear whether custom, model baked in images can use the openai route or not
when I send a request to it, it seems to still treat it as a 'normal' request, but maybe I need to pass in openai_route myself?
I've only tested that calling /openai/ path does indeed produce different responses even on my existing non-openai endpoints.
Response with /openai/... path:
Response with any other path (e.g. /testpath/):
Based on this behaviour, I started writing my worker code. I hope I'll be able to test it soon and that it will work.
So in custom images you haven't been able to use the /openai route?
it received mine but it still treats it like a normal request
it treats it like an openai request if i did a normal request with use_openai_format, openai_route and openai_input
but then that kinda defeats the point, i think the template vllm worker automatically adds those to the request or something
like they do here but then with openai_route : "chat/completion" and openai_input : messages
Endpoints | RunPod Documentation
Learn how to use the RunPod Python SDK to interact with various endpoints, perform synchronous and asynchronous operations, stream data, and check endpoint health. Discover how to set your Endpoint Id, run jobs, and cancel or purge queues with this comprehensive guide.
Yep yep if it doesn't work well like the official endpoint does
Do you have the same handler code logic as this, 3WaD and OBJay?
Uhmm, no. I am asking because I want to build a custom image, not a fork of the vLLM image. Do you know which part of that source code is making the path work exactly?
No i don't, but somehow you must have the handling logic to make it work like that
Like the input handling at least
Well have you tried to fork it first instead , then try deploy it
If it doesn't work on your own image, same code then report it to runpod, if not, match it first
Just to make sure - is this the correct place to reach someone from RunPod who knows about their endpoints and might have a short definite answer to this question? I appreciate the input and your time so far guys, but to summarize it so far, I've got a link to the official repo using the thing I am asking about, and got told to "hack around and find out" đ
This is community server, if you want to reach out to official staffs use ticketing. Sometimes staffs also check here
Not hack around and find out, you're building with the runpod sdk, and runpod endpoints, just use the starting base from the one that runpod team has built for receiving requests from openai endpoint
Thank you
I assume there is not enough documentation from runpod about receiving inputs from openai endpoints so the best thing you can do is use that as a starting point
Yup your welcome
Can someone please confirm that this works in custom images too?
Yes, our serverless API transformsIs there any way I can develop this locally?input
toopenai_route
+openai_input
as long as you send the request to/openai/*
This happens on our platform only. As of now, there is nothing in the SDK to simulate this during local development.
Ahhh. So it's not but just
Thank you very much for the confirmation!
Just
{"input":{"model": "...", "prompt": "..."}}
to pass to /openai/*
and that essentially gets passed to vllm as {"openai_input": {"model": "...", "prompt": "..."}, "openai_route": {}}
Passed to vllm? What if there's no vllm? I'll put it simply - when I send
{"foo":"bar"}
to https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc, will ANY handler function receive the payload (input and path) so we can work with it further or it's not possible on RunPod?"input" is key here
but then we cant use the openai client.chat.completions.create right? since it doesn't format it with the {"input":{....
Iâm not sure if Iâm missing something here. âinputâ is what you provide us. We take anything inside that and put it inside âopenai_inputâ.
Yes. We're asking if it's possible to send the json without wrapping it with "input" or not. Because that's how openAI standard requires it.
I see. Yes, the endpoint can be treated like an openai server. Your endpoint should have a value for
OPENAI BASE URL
You just have to make sure you send it to that path like so...
For more information, please go through https://github.com/runpod-workers/worker-vllm/GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
you're referring us to the default vllm template usage again, we were asking how to do this when you make a custom docker image with the model built in.
also, when I do that it seems to default it to a sync request (the id will be like sync-xxxxx) even though I use AsyncOpenAI
I don't understand how there's no documentation about this at all lol
We refer you to that page because thatâs where the answers are. Sending the requests to that specific path to your endpoint is the answer to your question. Have you tried it? Are you having issues with vllm blocking sync? It doesnât do that. Our requests to vllm will always be async. Youâll notice that when you make requests to a vllm endpoint on our serverless. There is nothing weâre doing that changes that. Weâre essentially proxying the requests. When you send sync requests, they come back immediately. Itâs non-blocking due to vllmâs multi-concurrent processing. If there are issues with this, please file a support ticket so we can help you. CS will be asking for additional information that shouldnât be divulged here on a public forum.
Yes of course we have tried it, we're asking this question because it doesn't work the way it should
So even though it shows the request as "sync-93450349539'', it still processes it async?
When I use a custom image it does NOT show the link like this here:
Despite that, are you able to send requests to the /openai/v1 path?
Yes. When youâre testing these with longer-running jobs, do they block like they are sync requests?
didn't test with longer running jobs, but good to know that it doesn't matter even if it shows as sync
I am finally back home, so I can test this myself. And for anyone coming here in the future and wondering what's the answer to this simple question:
Yes, you can send non-nested payloads to api.runpod.ai/v2/<ENDPOINT ID>/openai/* path that is using any custom async handler or software internally, and it will be available in the handler params.
That means when you send
{"foo":"bar"}
to .../openai/abc
:
I really wish this would be mentioned somewhere in the official docs, in the ask-ai knowledge base, or at least widely known to the team when asked. But thank you anyways.