R
RunPod•4mo ago
3WaD

OpenAI Serverless Endpoint Docs

Hello. From what I could find in the support threads here, you should be able to make a standard openAI request not wrapped in the "input" param if you hit your endpoint at https://api.runpod.ai/v2/<ENDPOINT ID>/openai/... The handler should then receive two new params, "openai_route" and "openai_input," but it's been a couple of months since the threads, and I can't find any official docs about this or the ability to test this locally with the RunPod lib. Can someone please confirm that this works in custom images too? If so, what is the structure of the parameters received? Does "input" in handler(input) contain "openai_input" and "openai_route" params directly? Is there any way I can develop this locally?
32 Replies
yhlong00000
yhlong00000•3mo ago
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
yhlong00000
yhlong00000•3mo ago
This read me has some examples
3WaD
3WaDOP•3mo ago
Yes. I've read through that source code before asking. Based on that, it should be how I wrote it. But someone also mentioned it might be allowed just for that official worker, so I wanted to make sure the /openai path also works for custom images before I write the whole code around it. Does it also mean we can send any raw "non-wrapped" payload to that endpoint even when it's not openai related? It should pass any content to the "openai_input" and any route after to the "openai_route", right? Having a setting or another documented endpoint on serverless allowing us to send raw payloads would solve such problems with predefined APIs.
OBJay
OBJay•3mo ago
Did you figure this out? it's still not clear whether custom, model baked in images can use the openai route or not when I send a request to it, it seems to still treat it as a 'normal' request, but maybe I need to pass in openai_route myself?
3WaD
3WaDOP•3mo ago
I've only tested that calling /openai/ path does indeed produce different responses even on my existing non-openai endpoints. Response with /openai/... path:
{"error": "Error processing the request"}
{"error": "Error processing the request"}
Response with any other path (e.g. /testpath/):
404 page not found
404 page not found
Based on this behaviour, I started writing my worker code. I hope I'll be able to test it soon and that it will work.
nerdylive
nerdylive•3mo ago
So in custom images you haven't been able to use the /openai route?
OBJay
OBJay•3mo ago
it received mine but it still treats it like a normal request it treats it like an openai request if i did a normal request with use_openai_format, openai_route and openai_input but then that kinda defeats the point, i think the template vllm worker automatically adds those to the request or something
OBJay
OBJay•3mo ago
like they do here but then with openai_route : "chat/completion" and openai_input : messages
No description
OBJay
OBJay•3mo ago
Endpoints | RunPod Documentation
Learn how to use the RunPod Python SDK to interact with various endpoints, perform synchronous and asynchronous operations, stream data, and check endpoint health. Discover how to set your Endpoint Id, run jobs, and cancel or purge queues with this comprehensive guide.
nerdylive
nerdylive•3mo ago
Yep yep if it doesn't work well like the official endpoint does Do you have the same handler code logic as this, 3WaD and OBJay?
3WaD
3WaDOP•3mo ago
Uhmm, no. I am asking because I want to build a custom image, not a fork of the vLLM image. Do you know which part of that source code is making the path work exactly?
nerdylive
nerdylive•3mo ago
No i don't, but somehow you must have the handling logic to make it work like that Like the input handling at least Well have you tried to fork it first instead , then try deploy it If it doesn't work on your own image, same code then report it to runpod, if not, match it first
3WaD
3WaDOP•3mo ago
Just to make sure - is this the correct place to reach someone from RunPod who knows about their endpoints and might have a short definite answer to this question? I appreciate the input and your time so far guys, but to summarize it so far, I've got a link to the official repo using the thing I am asking about, and got told to "hack around and find out" 😆
nerdylive
nerdylive•3mo ago
This is community server, if you want to reach out to official staffs use ticketing. Sometimes staffs also check here Not hack around and find out, you're building with the runpod sdk, and runpod endpoints, just use the starting base from the one that runpod team has built for receiving requests from openai endpoint
3WaD
3WaDOP•3mo ago
Thank you
nerdylive
nerdylive•3mo ago
I assume there is not enough documentation from runpod about receiving inputs from openai endpoints so the best thing you can do is use that as a starting point Yup your welcome
deanQ
deanQ•3mo ago
Can someone please confirm that this works in custom images too?
Yes, our serverless API transforms input to openai_route + openai_input as long as you send the request to /openai/*
Is there any way I can develop this locally?
This happens on our platform only. As of now, there is nothing in the SDK to simulate this during local development.
3WaD
3WaDOP•3mo ago
Ahhh. So it's not
{"input": {"openai_input": {}, "openai_route": {}}}
{"input": {"openai_input": {}, "openai_route": {}}}
but just
{"openai_input": {}, "openai_route": {}}
{"openai_input": {}, "openai_route": {}}
Thank you very much for the confirmation!
deanQ
deanQ•3mo ago
Just {"input":{"model": "...", "prompt": "..."}} to pass to /openai/* and that essentially gets passed to vllm as {"openai_input": {"model": "...", "prompt": "..."}, "openai_route": {}}
3WaD
3WaDOP•3mo ago
Passed to vllm? What if there's no vllm? I'll put it simply - when I send {"foo":"bar"} to https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc, will ANY handler function receive the payload (input and path) so we can work with it further or it's not possible on RunPod?
deanQ
deanQ•3mo ago
"input" is key here
OBJay
OBJay•3mo ago
but then we cant use the openai client.chat.completions.create right? since it doesn't format it with the {"input":{....
deanQ
deanQ•3mo ago
I’m not sure if I’m missing something here. “input” is what you provide us. We take anything inside that and put it inside “openai_input”.
3WaD
3WaDOP•3mo ago
Yes. We're asking if it's possible to send the json without wrapping it with "input" or not. Because that's how openAI standard requires it.
deanQ
deanQ•3mo ago
I see. Yes, the endpoint can be treated like an openai server. Your endpoint should have a value for OPENAI BASE URL You just have to make sure you send it to that path like so...
import openai
import asyncio

async def run():
runpod_endpoint_id = "vllm-1234567890"
runpod_api_key = "xxxxxxxxxxxx"
runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

openai_client = openai.AsyncOpenAI(
base_url=runpod_base_url,
api_key=runpod_api_key,
)

completion = openai_client.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={
"guided_choice": ["positive", "negative"]
},
)

return await asyncio.create_task(completion)
import openai
import asyncio

async def run():
runpod_endpoint_id = "vllm-1234567890"
runpod_api_key = "xxxxxxxxxxxx"
runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

openai_client = openai.AsyncOpenAI(
base_url=runpod_base_url,
api_key=runpod_api_key,
)

completion = openai_client.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={
"guided_choice": ["positive", "negative"]
},
)

return await asyncio.create_task(completion)
For more information, please go through https://github.com/runpod-workers/worker-vllm/
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
OBJay
OBJay•3mo ago
you're referring us to the default vllm template usage again, we were asking how to do this when you make a custom docker image with the model built in. also, when I do that it seems to default it to a sync request (the id will be like sync-xxxxx) even though I use AsyncOpenAI I don't understand how there's no documentation about this at all lol
deanQ
deanQ•3mo ago
We refer you to that page because that’s where the answers are. Sending the requests to that specific path to your endpoint is the answer to your question. Have you tried it? Are you having issues with vllm blocking sync? It doesn’t do that. Our requests to vllm will always be async. You’ll notice that when you make requests to a vllm endpoint on our serverless. There is nothing we’re doing that changes that. We’re essentially proxying the requests. When you send sync requests, they come back immediately. It’s non-blocking due to vllm’s multi-concurrent processing. If there are issues with this, please file a support ticket so we can help you. CS will be asking for additional information that shouldn’t be divulged here on a public forum.
OBJay
OBJay•3mo ago
Yes of course we have tried it, we're asking this question because it doesn't work the way it should So even though it shows the request as "sync-93450349539'', it still processes it async?
OBJay
OBJay•3mo ago
When I use a custom image it does NOT show the link like this here:
No description
deanQ
deanQ•3mo ago
Despite that, are you able to send requests to the /openai/v1 path? Yes. When you’re testing these with longer-running jobs, do they block like they are sync requests?
OBJay
OBJay•3mo ago
didn't test with longer running jobs, but good to know that it doesn't matter even if it shows as sync
3WaD
3WaDOP•3mo ago
I am finally back home, so I can test this myself. And for anyone coming here in the future and wondering what's the answer to this simple question: Yes, you can send non-nested payloads to api.runpod.ai/v2/<ENDPOINT ID>/openai/* path that is using any custom async handler or software internally, and it will be available in the handler params. That means when you send {"foo":"bar"} to .../openai/abc:
async def handler(job):
# job["input"] will include {'openai_input': {'foo': 'bar'}, 'openai_route': '/abc'}
#...
runpod.serverless.start({"handler": handler,"return_aggregate_stream": True})
async def handler(job):
# job["input"] will include {'openai_input': {'foo': 'bar'}, 'openai_route': '/abc'}
#...
runpod.serverless.start({"handler": handler,"return_aggregate_stream": True})
I really wish this would be mentioned somewhere in the official docs, in the ask-ai knowledge base, or at least widely known to the team when asked. But thank you anyways.

Did you find this page helpful?