RunPod•7mo ago

OpenAI Serverless Endpoint Docs

Hello. From what I could find in the support threads here, you should be able to make a standard openAI request not wrapped in the "input" param if you hit your endpoint at https://api.runpod.ai/v2/<ENDPOINT ID>/openai/... The handler should then receive two new params, "openai_route" and "openai_input," but it's been a couple of months since the threads, and I can't find any official docs about this or the ability to test this locally with the RunPod lib. Can someone please confirm that this works in custom images too? If so, what is the structure of the parameters received? Does "input" in handler(input) contain "openai_input" and "openai_route" params directly? Is there any way I can develop this locally?

32 Replies

yhlong00000•7mo ago

https://github.com/runpod-workers/worker-vllm

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

yhlong00000•7mo ago

This read me has some examples

3WaDOP•7mo ago

Yes. I've read through that source code before asking. Based on that, it should be how I wrote it. But someone also mentioned it might be allowed just for that official worker, so I wanted to make sure the /openai path also works for custom images before I write the whole code around it. Does it also mean we can send any raw "non-wrapped" payload to that endpoint even when it's not openai related? It should pass any content to the "openai_input" and any route after to the "openai_route", right? Having a setting or another documented endpoint on serverless allowing us to send raw payloads would solve such problems with predefined APIs.

OBJay•6mo ago

Did you figure this out? it's still not clear whether custom, model baked in images can use the openai route or not when I send a request to it, it seems to still treat it as a 'normal' request, but maybe I need to pass in openai_route myself?

3WaDOP•6mo ago

I've only tested that calling /openai/ path does indeed produce different responses even on my existing non-openai endpoints. Response with /openai/... path:

{"error": "Error processing the request"}

{"error": "Error processing the request"}

Response with any other path (e.g. /testpath/):

404 page not found

404 page not found

Based on this behaviour, I started writing my worker code. I hope I'll be able to test it soon and that it will work.

Jason•6mo ago

So in custom images you haven't been able to use the /openai route?

OBJay•6mo ago

it received mine but it still treats it like a normal request it treats it like an openai request if i did a normal request with use_openai_format, openai_route and openai_input but then that kinda defeats the point, i think the template vllm worker automatically adds those to the request or something

OBJay•6mo ago

like they do here but then with openai_route : "chat/completion" and openai_input : messages

OBJay•6mo ago

https://docs.runpod.io/sdks/python/endpoints

Endpoints | RunPod Documentation

Learn how to use the RunPod Python SDK to interact with various endpoints, perform synchronous and asynchronous operations, stream data, and check endpoint health. Discover how to set your Endpoint Id, run jobs, and cancel or purge queues with this comprehensive guide.

Jason•6mo ago

Yep yep if it doesn't work well like the official endpoint does Do you have the same handler code logic as this, 3WaD and OBJay?

3WaDOP•6mo ago

Uhmm, no. I am asking because I want to build a custom image, not a fork of the vLLM image. Do you know which part of that source code is making the path work exactly?

Jason•6mo ago

No i don't, but somehow you must have the handling logic to make it work like that Like the input handling at least Well have you tried to fork it first instead , then try deploy it If it doesn't work on your own image, same code then report it to runpod, if not, match it first

3WaDOP•6mo ago

Just to make sure - is this the correct place to reach someone from RunPod who knows about their endpoints and might have a short definite answer to this question? I appreciate the input and your time so far guys, but to summarize it so far, I've got a link to the official repo using the thing I am asking about, and got told to "hack around and find out" 😆

Jason•6mo ago

This is community server, if you want to reach out to official staffs use ticketing. Sometimes staffs also check here Not hack around and find out, you're building with the runpod sdk, and runpod endpoints, just use the starting base from the one that runpod team has built for receiving requests from openai endpoint

3WaDOP•6mo ago

Thank you

Jason•6mo ago

I assume there is not enough documentation from runpod about receiving inputs from openai endpoints so the best thing you can do is use that as a starting point Yup your welcome

deanQ•6mo ago

Can someone please confirm that this works in custom images too?

Yes, our serverless API transforms input to openai_route + openai_input as long as you send the request to /openai/*

Is there any way I can develop this locally?

This happens on our platform only. As of now, there is nothing in the SDK to simulate this during local development.

3WaDOP•6mo ago

Ahhh. So it's not

{"input": {"openai_input": {}, "openai_route": {}}}

{"input": {"openai_input": {}, "openai_route": {}}}

but just

{"openai_input": {}, "openai_route": {}}

{"openai_input": {}, "openai_route": {}}

Thank you very much for the confirmation!

deanQ•6mo ago

Just {"input":{"model": "...", "prompt": "..."}} to pass to /openai/* and that essentially gets passed to vllm as {"openai_input": {"model": "...", "prompt": "..."}, "openai_route": {}}

3WaDOP•6mo ago

Passed to vllm? What if there's no vllm? I'll put it simply - when I send {"foo":"bar"} to https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc, will ANY handler function receive the payload (input and path) so we can work with it further or it's not possible on RunPod?

deanQ•6mo ago

"input" is key here

OBJay•6mo ago

but then we cant use the openai client.chat.completions.create right? since it doesn't format it with the {"input":{....

deanQ•6mo ago

I’m not sure if I’m missing something here. “input” is what you provide us. We take anything inside that and put it inside “openai_input”.

3WaDOP•6mo ago

Yes. We're asking if it's possible to send the json without wrapping it with "input" or not. Because that's how openAI standard requires it.

deanQ•6mo ago

I see. Yes, the endpoint can be treated like an openai server. Your endpoint should have a value for OPENAI BASE URL You just have to make sure you send it to that path like so...

import openai
import asyncio

async def run():
  runpod_endpoint_id = "vllm-1234567890"
  runpod_api_key = "xxxxxxxxxxxx"
  runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

  openai_client = openai.AsyncOpenAI(
      base_url=runpod_base_url,
      api_key=runpod_api_key,
  )

  completion = openai_client.completions.create(
    model="NousResearch/Meta-Llama-3-8B-Instruct",
    messages=[
      {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={
      "guided_choice": ["positive", "negative"]
    },
  )
  
  return await asyncio.create_task(completion)

import openai
import asyncio

async def run():
  runpod_endpoint_id = "vllm-1234567890"
  runpod_api_key = "xxxxxxxxxxxx"
  runpod_base_url = f"https://api.runpod.ai/v2/{runpod_endpoint_id}/openai/v1"

  openai_client = openai.AsyncOpenAI(
      base_url=runpod_base_url,
      api_key=runpod_api_key,
  )

  completion = openai_client.completions.create(
    model="NousResearch/Meta-Llama-3-8B-Instruct",
    messages=[
      {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={
      "guided_choice": ["positive", "negative"]
    },
  )
  
  return await asyncio.create_task(completion)

For more information, please go through https://github.com/runpod-workers/worker-vllm/

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

OBJay•6mo ago

you're referring us to the default vllm template usage again, we were asking how to do this when you make a custom docker image with the model built in. also, when I do that it seems to default it to a sync request (the id will be like sync-xxxxx) even though I use AsyncOpenAI I don't understand how there's no documentation about this at all lol

deanQ•6mo ago

We refer you to that page because that’s where the answers are. Sending the requests to that specific path to your endpoint is the answer to your question. Have you tried it? Are you having issues with vllm blocking sync? It doesn’t do that. Our requests to vllm will always be async. You’ll notice that when you make requests to a vllm endpoint on our serverless. There is nothing we’re doing that changes that. We’re essentially proxying the requests. When you send sync requests, they come back immediately. It’s non-blocking due to vllm’s multi-concurrent processing. If there are issues with this, please file a support ticket so we can help you. CS will be asking for additional information that shouldn’t be divulged here on a public forum.

OBJay•6mo ago

Yes of course we have tried it, we're asking this question because it doesn't work the way it should So even though it shows the request as "sync-93450349539'', it still processes it async?

OBJay•6mo ago

When I use a custom image it does NOT show the link like this here:

deanQ•6mo ago

Despite that, are you able to send requests to the /openai/v1 path? Yes. When you’re testing these with longer-running jobs, do they block like they are sync requests?

OBJay•6mo ago

didn't test with longer running jobs, but good to know that it doesn't matter even if it shows as sync

3WaDOP•6mo ago

I am finally back home, so I can test this myself. And for anyone coming here in the future and wondering what's the answer to this simple question: Yes, you can send non-nested payloads to api.runpod.ai/v2/<ENDPOINT ID>/openai/* path that is using any custom async handler or software internally, and it will be available in the handler params. That means when you send {"foo":"bar"} to .../openai/abc:

async def handler(job):
# job["input"] will include {'openai_input': {'foo': 'bar'}, 'openai_route': '/abc'}
#...
runpod.serverless.start({"handler": handler,"return_aggregate_stream": True})

async def handler(job):
# job["input"] will include {'openai_input': {'foo': 'bar'}, 'openai_route': '/abc'}
#...
runpod.serverless.start({"handler": handler,"return_aggregate_stream": True})

I really wish this would be mentioned somewhere in the official docs, in the ask-ai knowledge base, or at least widely known to the team when asked. But thank you anyways.

Gaming

Programming

OpenAI Serverless Endpoint Docs

Did you find this page helpful?