Custom vLLM OpenAI compatible API

Hello, I'm running OpenAI compatble server using vLLM. In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?
13 Replies
vlad000ss
vlad000ssOP3w ago
OpenAI compatibility | RunPod Documentation
Discover the vLLM Worker, a cloud-based AI model that integrates with OpenAI's API for seamless interaction. With its streaming and non-streaming capabilities, it's ideal for chatbots, conversational AI, and natural language processing applications.
nerdylive
nerdylive3w ago
oh you use the openai package ssearch "openai package pip" in google its for python
nerdylive
nerdylive3w ago
but then you set it like this:
No description
nerdylive
nerdylive3w ago
fill in the RUNPOD_ENDPOINT_ID variable in python with your endpoint id. it is the random characters on top left of your endpoint
vlad000ss
vlad000ssOP3w ago
It didn't work... The problem is that when I send the request to /openai/v1 the enpoint is invoked but the request is not processed, I guess because my vllm process is listening to just /v1 endpoint, didn't you have such problem? I'm using my custom vllm image, not the runpod one
nerdylive
nerdylive3w ago
Oh never tried custom image.. But staff said that it would be in the same url formar Format They proxy the url, so you should use their format like this one, just replace the runpod endpoint ID with yours
Sagor Sarker
Sagor Sarker2d ago
I am having the same issue. I have a prepare a docker container with custom vLLM serving there. Created a template with that docker using docker hub. In serverless machine got created and I can use the endpoint using localhost:port but from outside I can't access the server. It got stuck. Maybe the it can't make connection using the above openai script. Anyone have any clue?
nerdylive
nerdylive2d ago
How did you access it, did you expose any ports?
Sagor Sarker
Sagor Sarker2d ago
Hi @nerdylive I have exposed 8000 port as TCP port as the server is running in this port. 1. I am trying to access it both in "request" method exist in the servereless. It's infinitely in queue. 2. I tried programmtically like below:
curl --request POST \
--url https://api.runpod.ai/v2/my_endpoint_id/runsync \
--header "accept: application/json" \
--header "authorization: my_runpod_api_key" \
--header "content-type: application/json" \
--data '
{
"input": {
"prompt": "What is the weather in Dhaka?"
}
}
'
curl --request POST \
--url https://api.runpod.ai/v2/my_endpoint_id/runsync \
--header "accept: application/json" \
--header "authorization: my_runpod_api_key" \
--header "content-type: application/json" \
--data '
{
"input": {
"prompt": "What is the weather in Dhaka?"
}
}
'
3. Tried openai compatibility as I served using vLLM serve command
import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1",
api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user",
"content": "আজকে ঢাকার আবহাওয়া কেমন?"} ],
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto"
)

print(response)
import os
from openai import OpenAI

client = OpenAI(
base_url="https://api.runpod.ai/v2/my_endpoint_id/openai/v1",
api_key="my_runpod_api_key"
)

response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user",
"content": "আজকে ঢাকার আবহাওয়া কেমন?"} ],
tools=[{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
}],
tool_choice="auto"
)

print(response)
nerdylive
nerdylive2d ago
So what's the error, can is ee some logs Or anything about the error Ah and how did you access the ports for external?
Sagor Sarker
Sagor Sarker2d ago
Hi, there no error showing. It stays in queue and I can't see any process log in the worker machine. I didn't specify the port. How can I do that? I tried with runpod proxy method, then it was working for a single worker machine.
nerdylive
nerdylive2d ago
Did you use any starting ponnt Point? I'd suggest looking at vllm-worker in github and start there if you want to customize Then you can use the openai api, unless you broke it somehow, then try to undo things until it's working back
Sagor Sarker
Sagor Sarker2d ago
One of the main problem in runpod vLLM based docker is it's not working for tool calling. That is the reason I move for the custom docker builder using vLLM serve method. You are right, maybe I directly serve as "vllm serve ................ " in the docker starting point might be not compatible with runpod. I will try to follow your suggestions. Thank you. I don't know why runpod vLLM is not solving the tool-calling issue yet. But it's an essential need.
Want results from more Discord servers?
Add your server