Custom vLLM OpenAI compatible API
Hello,
I'm running OpenAI compatble server using vLLM.
In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?
13 Replies
https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#initialize-your-project
For anyone who will face the same issue as I did
OpenAI compatibility | RunPod Documentation
Discover the vLLM Worker, a cloud-based AI model that integrates with OpenAI's API for seamless interaction. With its streaming and non-streaming capabilities, it's ideal for chatbots, conversational AI, and natural language processing applications.
oh
you use the openai package
ssearch "openai package pip" in google
its for python
but then you set it like this:
fill in the RUNPOD_ENDPOINT_ID variable in python with your endpoint id. it is the random characters on top left of your endpoint
It didn't work... The problem is that when I send the request to /openai/v1 the enpoint is invoked but the request is not processed, I guess because my vllm process is listening to just /v1 endpoint, didn't you have such problem? I'm using my custom vllm image, not the runpod one
Oh never tried custom image.. But staff said that it would be in the same url formar
Format
They proxy the url, so you should use their format like this one, just replace the runpod endpoint ID with yours
I am having the same issue. I have a prepare a docker container with custom vLLM serving there. Created a template with that docker using docker hub. In serverless machine got created and I can use the endpoint using localhost:port but from outside I can't access the server. It got stuck. Maybe the it can't make connection using the above openai script. Anyone have any clue?
How did you access it, did you expose any ports?
Hi @nerdylive
I have exposed 8000 port as TCP port as the server is running in this port.
1. I am trying to access it both in "request" method exist in the servereless. It's infinitely in queue.
2. I tried programmtically like below:
3. Tried openai compatibility as I served using vLLM serve command
So what's the error, can is ee some logs
Or anything about the error
Ah and how did you access the ports for external?
Hi,
there no error showing. It stays in queue and I can't see any process log in the worker machine. I didn't specify the port. How can I do that?
I tried with runpod proxy method, then it was working for a single worker machine.
Did you use any starting ponnt
Point?
I'd suggest looking at vllm-worker in github and start there if you want to customize
Then you can use the openai api, unless you broke it somehow, then try to undo things until it's working back
One of the main problem in runpod vLLM based docker is it's not working for tool calling.
That is the reason I move for the custom docker builder using vLLM serve method.
You are right, maybe I directly serve as "vllm serve ................ " in the docker starting point might be not compatible with runpod.
I will try to follow your suggestions.
Thank you.
I don't know why runpod vLLM is not solving the tool-calling issue yet. But it's an essential need.