4 Replies
This user has a great template https://github.com/SvenBrnn/runpod-worker-ollama :)
https://hub.docker.com/r/svenbrnn/runpod-ollama
GitHub
GitHub - SvenBrnn/runpod-worker-ollama: A serverless ollama worker ...
A serverless ollama worker for runpod.io. Contribute to SvenBrnn/runpod-worker-ollama development by creating an account on GitHub.
is it as good as the runpod vllm template? in terms of performance and concurrecy stuff
I haven't tested it personally but I can only assume so? I can give it a try for you in the morning if you don't get to test it out tonight.
no its most likely not, there is no cache implemented that vllm can use so startup will take a bit longer as vllm
the wrapper is just starting a ollama inside and is wrapping requests to runpod so ollama can understand and answer them. Its also nice for gguf models or models from official ollama repository however its just a small project.
I just added automatically updating the container today, so there should always ne a new container for new ollama versions ready after max 24h now
It will however fully work with all endpoints including the openapi endpoints