SvenBrnn
RRunPod
•Created by Bj9000 on 1/27/2025 in #⚡|serverless
Serveless quants
i was also searching for it last week, i ended up giving up as there seems to be a ticket about this on the github of the vllm worker.
https://github.com/runpod-workers/worker-vllm/issues/98
Doesn't seem on their tasklist anytime soon so i ended up building my own ollama based runner.
4 replies
RRunPod
•Created by Mohamed Nagy on 1/26/2025 in #⚡|serverless
How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1
Just have a look at https://github.com/SvenBrnn/runpod-worker-ollama/tree/master/wrapper/src
I've build this last week with trying around on a CPU instance and figured out how stuff comes in when it comes from an OpenAI endpoint:
https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/test_inputs/openai_completion.json
https://github.com/SvenBrnn/runpod-worker-ollama/blob/master/test_inputs/openai_get_models.json
9 replies