RunPod•3mo ago

Custom vLLM OpenAI compatible API

Hello, I'm running OpenAI compatble server using vLLM. In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?

13 Replies

vlad000ssOP•3mo ago

https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#initialize-your-project For anyone who will face the same issue as I did

OpenAI compatibility | RunPod Documentation

Discover the vLLM Worker, a cloud-based AI model that integrates with OpenAI's API for seamless interaction. With its streaming and non-streaming capabilities, it's ideal for chatbots, conversational AI, and natural language processing applications.

nerdylive•3mo ago

oh you use the openai package ssearch "openai package pip" in google its for python

nerdylive•3mo ago

but then you set it like this:

nerdylive•3mo ago

fill in the RUNPOD_ENDPOINT_ID variable in python with your endpoint id. it is the random characters on top left of your endpoint

vlad000ssOP•3mo ago

It didn't work... The problem is that when I send the request to /openai/v1 the enpoint is invoked but the request is not processed, I guess because my vllm process is listening to just /v1 endpoint, didn't you have such problem? I'm using my custom vllm image, not the runpod one

nerdylive•3mo ago

Oh never tried custom image.. But staff said that it would be in the same url formar Format They proxy the url, so you should use their format like this one, just replace the runpod endpoint ID with yours

Sagor Sarker•2mo ago

I am having the same issue. I have a prepare a docker container with custom vLLM serving there. Created a template with that docker using docker hub. In serverless machine got created and I can use the endpoint using localhost:port but from outside I can't access the server. It got stuck. Maybe the it can't make connection using the above openai script. Anyone have any clue?

nerdylive•2mo ago

How did you access it, did you expose any ports?

Sagor Sarker•2mo ago

Hi @nerdylive I have exposed 8000 port as TCP port as the server is running in this port. 1. I am trying to access it both in "request" method exist in the servereless. It's infinitely in queue. 2. I tried programmtically like below:

curl --request POST \
     --url https://api.runpod.ai/v2/my_endpoint_id/runsync \
     --header "accept: application/json" \
     --header "authorization: my_runpod_api_key" \
     --header "content-type: application/json" \
     --data '
{
  "input": {
    "prompt": "What is the weather in Dhaka?"
  }
}
'

curl --request POST \
     --url https://api.runpod.ai/v2/my_endpoint_id/runsync \
     --header "accept: application/json" \
     --header "authorization: my_runpod_api_key" \
     --header "content-type: application/json" \
     --data '
{
  "input": {
    "prompt": "What is the weather in Dhaka?"
  }
}
'

3. Tried openai compatibility as I served using vLLM serve command

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1", 
    api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user",
            "content": "আজকে ঢাকার আবহাওয়া কেমন?"}    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1", 
    api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user",
            "content": "আজকে ঢাকার আবহাওয়া কেমন?"}    ],
    tools=[{
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        }
                    },
                    "required": ["location"]
                }
            }
        }],
    tool_choice="auto"
)

print(response)

nerdylive•2mo ago

So what's the error, can is ee some logs Or anything about the error Ah and how did you access the ports for external?

Sagor Sarker•2mo ago

Hi, there no error showing. It stays in queue and I can't see any process log in the worker machine. I didn't specify the port. How can I do that? I tried with runpod proxy method, then it was working for a single worker machine.

nerdylive•2mo ago

Did you use any starting ponnt Point? I'd suggest looking at vllm-worker in github and start there if you want to customize Then you can use the openai api, unless you broke it somehow, then try to undo things until it's working back

Sagor Sarker•2mo ago

One of the main problem in runpod vLLM based docker is it's not working for tool calling. That is the reason I move for the custom docker builder using vLLM serve method. You are right, maybe I directly serve as "vllm serve ................ " in the docker starting point might be not compatible with runpod. I will try to follow your suggestions. Thank you. I don't know why runpod vLLM is not solving the tool-calling issue yet. But it's an essential need.

Gaming

Programming

Custom vLLM OpenAI compatible API

Did you find this page helpful?