openai compatible endpoint for custom serverless docker image
how can I get openai compatible endpoint for my custom docker image in runpod serverless.
I am trying to create llama cpp docker image
3 Replies
The way I would approach it is:
1) Clientside class that abstracts connecting to runpod some sort of JSON request for the handler to parse and do a switch / if statement against
2) You need to modify your handler.py to respond properly
There is no built in way to do it
But if I do this then the agents will become complicated and will not use generally accepted api
Although I could find an workaround in another thread :
What happens is when you hit
https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc
The handler receives two new key-value pairs in the job["handler"] input:
- "openai_route": this will be everything in the link you hit after /openai, so for the example case its value would be
/abc
, you would use this to tell the handler to do logic for /v1/chat/completions
, /v1/models
, etc
- "openai_input" the openai request as a dictionary, with message, etc
if you dont have stream: true
in your openai request, then you just return the openai completions/chatcompletions/etc object as a dict in the output
(Returned here: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L160)
If you have stream: true
, then this will be an SSE stream, for which you would yield your output, but instead of yielding the dict directly, you would put it in an SSE stream chunk string format, which is something like f"data: {your json output as string}"\n\n"
(Stream code: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L161)
Most of the code is in this class in general: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L109
Will work on documentation soonGitHub
worker-vllm/src/engine.py at 0a5b5bc095153363e8d45af1a2fa6f2d264255...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm