RunPod•13mo ago

Ollama on RunPod

Hey all, I am attempting to set up Ollama on a Nvidia GeForce RTX 4090 pod. The commands for that are pretty straightforward (link to article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20). All I do is run the following two commands on the pod's web terminal after it starts up, and I'm good to go: 1) (curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) & 2) ollama run [model_name] However, what I would like to do is have these commands run automatically upon starting the pod. My initial thought was to enter the above two commands into the 'Container Start Command' field on the pod deployment page (as seen in image attached). I'm not sure how to write these start-up commands and would be grateful for any assistance.

Set up Ollama on your GPU Pod | RunPod Documentation

Set up Ollama server and run LLMs with RunPod GPUs

26 Replies

Madiator2011•13mo ago

why not use ollama docker image instead?

acampOP•13mo ago

I was just looking into that. Did you have any resources that maybe helpful? I was referring to this link: https://hub.docker.com/r/ollama/ollama#!, but I was wondering if there was an approach more suited to RunPod.

Madiator2011•13mo ago

you want something like this https://runpod.io/console/deploy?template=q5rqanpolz&ref=vfker49t note this is api thingy not like chat via terminal

acampOP•13mo ago

Thanks for the link. I went ahead and spun up a pod with the ollama/ollama container image. After the pod starts, would you know how to make inferences with a model (e.g. gemma).

Madiator2011•13mo ago

you could pass gemma in container command

Madiator2011•13mo ago

like this

acampOP•13mo ago

I went ahead and tried "run gemma" (image attached), but I get an error message in the container logs that says: Error: could not connect to ollama app, is it running?

Madiator2011•13mo ago

Delete run

acampOP•13mo ago

If I just have "gemma", the error messages is: Error: unknown command "gemma" for "ollama"

Madiator2011•13mo ago

try gemma:7b

acampOP•13mo ago

It seems to be returning the same error.

Madiator2011•13mo ago

try memby set image to ollama/ollama:latest

acampOP•13mo ago

Tried this, and the error seems to be the same. It looks like I just have to run two commands - "serve" and "run gemma", after which I should be able to make inferences with gemma, but I'm not sure how to implement that. Thank you for all the support so far, but are there any other fixes I coudl implement to get this to work?

Madiator2011•13mo ago

what if you put serve run gemma

acampOP•13mo ago

It returns the following error: Error: accepts 0 arg(s), received 2 It looks like the Container Start Command can only take one command.

Madiator2011•13mo ago

the docker container runs command server first

acampOP•12mo ago

Yes, but in this case I think it's trying to run the command "serve" along with "run" and "gemma" as the arguments.

Madiator2011•12mo ago

yes

acampOP•12mo ago

@justin [Not Staff] Hey Justin, I noticed that you were able to provide some valuable advise to other users regarding Ollama on runpod, so I was hoping to reach out to you regarding this thread, that I have yet to debug.

J.•12mo ago

No, you hit one of the problems I have with Ollama. You need a background server to then run the ollama run command; i've tried to automate this in the pass adding a simple start.sh script, so on, but I couldn't get it working with the pod. But I could get something basic working with serverless, but it still "redownloads" the model every time. Idk something about their hashing algorithm. I ended up using openllm instead: https://github.com/bentoml/OpenLLM And then in my dockerfile, I just run this preload.py script, which basically does everything I need it to do. https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless/blob/main/preload.py Maybe you can play around with my repo on Pod mode, to see if you can get it working with Gemma

GitHub

GitHub - bentoml/OpenLLM: Run any open-source LLMs, such as Llama 2...

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud. - bentoml/OpenLLM

GitHub

Runpod-OpenLLM-Pod-and-Serverless/preload.py at main · justinwlin/R...

A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.

J.•12mo ago

but ive included instructions for llama / mistral7b

Madiator2011•12mo ago

@justin [Not Staff] @acamp #Open WebUI (Formerly Ollama WebUI) something you might like it has ollama running in background 🙂

J.•12mo ago

Oo is there a repo to the docker file id love to see how it works

Madiator2011•12mo ago

https://github.com/open-webui/open-webui

GitHub

GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Forme...

User-friendly WebUI for LLMs (Formerly Ollama WebUI) - open-webui/open-webui

Madiator2011•12mo ago

btw you can use service instead of runpodctl in pods create service file and start with service name start

acampOP•12mo ago

@Papa Madiator and @justin [Not Staff] Thank you both for the assistance and resources! Would you happen to know if it's possible to setup llama3 on open-webui and make inferences to it using an API? I was not able to find specific instructions on how to set up an LLM on open-webui

Gaming

Programming

Ollama on RunPod

Did you find this page helpful?