RunPod•12mo ago

openai compatible endpoint for custom serverless docker image

how can I get openai compatible endpoint for my custom docker image in runpod serverless. I am trying to create llama cpp docker image

3 Replies

J.•12mo ago

The way I would approach it is: 1) Clientside class that abstracts connecting to runpod some sort of JSON request for the handler to parse and do a switch / if statement against 2) You need to modify your handler.py to respond properly There is no built in way to do it

ngagefreak05OP•12mo ago

But if I do this then the agents will become complicated and will not use generally accepted api

ngagefreak05OP•12mo ago

Although I could find an workaround in another thread : What happens is when you hit https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc The handler receives two new key-value pairs in the job["handler"] input: - "openai_route": this will be everything in the link you hit after /openai, so for the example case its value would be /abc , you would use this to tell the handler to do logic for /v1/chat/completions, /v1/models, etc - "openai_input" the openai request as a dictionary, with message, etc if you dont have stream: true in your openai request, then you just return the openai completions/chatcompletions/etc object as a dict in the output (Returned here: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L160) If you have stream: true , then this will be an SSE stream, for which you would yield your output, but instead of yielding the dict directly, you would put it in an SSE stream chunk string format, which is something like f"data: {your json output as string}"\n\n" (Stream code: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L161) Most of the code is in this class in general: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L109 Will work on documentation soon

GitHub

worker-vllm/src/engine.py at 0a5b5bc095153363e8d45af1a2fa6f2d264255...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Gaming

Programming

openai compatible endpoint for custom serverless docker image

Did you find this page helpful?