openai compatible endpoint for custom serverless docker image

how can I get openai compatible endpoint for my custom docker image in runpod serverless. I am trying to create llama cpp docker image
3 Replies
justin
justin7mo ago
The way I would approach it is: 1) Clientside class that abstracts connecting to runpod some sort of JSON request for the handler to parse and do a switch / if statement against 2) You need to modify your handler.py to respond properly There is no built in way to do it
ngagefreak05
ngagefreak05OP7mo ago
But if I do this then the agents will become complicated and will not use generally accepted api
ngagefreak05
ngagefreak05OP7mo ago
Although I could find an workaround in another thread : What happens is when you hit https://api.runpod.ai/v2/<ENDPOINT ID>/openai/abc The handler receives two new key-value pairs in the job["handler"] input: - "openai_route": this will be everything in the link you hit after /openai, so for the example case its value would be /abc , you would use this to tell the handler to do logic for /v1/chat/completions, /v1/models, etc - "openai_input" the openai request as a dictionary, with message, etc if you dont have stream: true in your openai request, then you just return the openai completions/chatcompletions/etc object as a dict in the output (Returned here: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L160) If you have stream: true , then this will be an SSE stream, for which you would yield your output, but instead of yielding the dict directly, you would put it in an SSE stream chunk string format, which is something like f"data: {your json output as string}"\n\n" (Stream code: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L161) Most of the code is in this class in general: https://github.com/runpod-workers/worker-vllm/blob/0a5b5bc095153363e8d45af1a2fa6f2d26425530/src/engine.py#L109 Will work on documentation soon
GitHub
worker-vllm/src/engine.py at 0a5b5bc095153363e8d45af1a2fa6f2d264255...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Want results from more Discord servers?
Add your server