Ollama on RunPod
Hey all,
I am attempting to set up Ollama on a Nvidia GeForce RTX 4090 pod. The commands for that are pretty straightforward (link to article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20). All I do is run the following two commands on the pod's web terminal after it starts up, and I'm good to go:
1) (curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) &
2) ollama run [model_name]
However, what I would like to do is have these commands run automatically upon starting the pod. My initial thought was to enter the above two commands into the 'Container Start Command' field on the pod deployment page (as seen in image attached). I'm not sure how to write these start-up commands and would be grateful for any assistance.
Set up Ollama on your GPU Pod | RunPod Documentation
Set up Ollama server and run LLMs with RunPod GPUs
26 Replies
why not use ollama docker image instead?
I was just looking into that. Did you have any resources that maybe helpful? I was referring to this link: https://hub.docker.com/r/ollama/ollama#!, but I was wondering if there was an approach more suited to RunPod.
you want something like this
https://runpod.io/console/deploy?template=q5rqanpolz&ref=vfker49t
note this is api thingy not like chat via terminal
Thanks for the link. I went ahead and spun up a pod with the ollama/ollama container image. After the pod starts, would you know how to make inferences with a model (e.g. gemma).
you could pass gemma in container command
like this
I went ahead and tried "run gemma" (image attached), but I get an error message in the container logs that says:
Error: could not connect to ollama app, is it running?
Delete run
If I just have "gemma", the error messages is:
Error: unknown command "gemma" for "ollama"
try gemma:7b
It seems to be returning the same error.
try memby set image to ollama/ollama:latest
Tried this, and the error seems to be the same.
It looks like I just have to run two commands - "serve" and "run gemma", after which I should be able to make inferences with gemma, but I'm not sure how to implement that.
Thank you for all the support so far, but are there any other fixes I coudl implement to get this to work?
what if you put serve run gemma
It returns the following error:
Error: accepts 0 arg(s), received 2
It looks like the Container Start Command can only take one command.
the docker container runs command server first
Yes, but in this case I think it's trying to run the command "serve" along with "run" and "gemma" as the arguments.
yes
@justin [Not Staff] Hey Justin,
I noticed that you were able to provide some valuable advise to other users regarding Ollama on runpod, so I was hoping to reach out to you regarding this thread, that I have yet to debug.
No, you hit one of the problems I have with Ollama. You need a background server to then run the ollama run command; i've tried to automate this in the pass adding a simple start.sh script, so on, but I couldn't get it working with the pod. But I could get something basic working with serverless, but it still "redownloads" the model every time. Idk something about their hashing algorithm.
I ended up using openllm instead:
https://github.com/bentoml/OpenLLM
And then in my dockerfile, I just run this preload.py script, which basically does everything I need it to do.
https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless/blob/main/preload.py
Maybe you can play around with my repo on Pod mode, to see if you can get it working with Gemma
GitHub
GitHub - bentoml/OpenLLM: Run any open-source LLMs, such as Llama 2...
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud. - bentoml/OpenLLM
GitHub
Runpod-OpenLLM-Pod-and-Serverless/preload.py at main · justinwlin/R...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.
but ive included instructions for llama / mistral7b
@justin [Not Staff] @acamp #Open WebUI (Formerly Ollama WebUI) something you might like it has ollama running in background 🙂
Oo is there a repo to the docker file
id love to see how it works
GitHub
GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Forme...
User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui
btw you can use service instead of runpodctl in pods
create service file and start with
service name start
@Papa Madiator and @justin [Not Staff] Thank you both for the assistance and resources! Would you happen to know if it's possible to setup llama3 on open-webui and make inferences to it using an API? I was not able to find specific instructions on how to set up an LLM on open-webui