RunPod•6mo ago

Ollama on Runpod

After following all instructions in the following article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20 I am able to setup a Ollama on a pod, however after a few inferences, I get a 504 (sometimes 524) error in response. I have been making inferences to Ollama on a Runpod pod for the past few months now, and never faced this issue, so it's definitely more recent. Any thought on what might be going on?

Set up Ollama on your GPU Pod | RunPod Documentation

Learn how to set up Ollama, a powerful language model, on a GPU Pod using RunPod, and interact with it through HTTP API requests, allowing you to harness the power of GPU acceleration for your AI projects.

9 Replies

baldy•6mo ago

i think 524 is an error generated by cloudflare when the connection times out -- are you requesting streamed responses? if not, i wonder if the response is just taking too long and maybe the proxy is cutting off the connection due to inactivity i'm also using ollama in some runpods and haven't had too many problems, although i am using streaming (for the most part)

acampOP•6mo ago

I have not been using streamed responses. After a bit of explorng, I think the issue seems to lie with the ollama version. The download link (presented in the article) installs ollama version 0.4.1, however, when I used an older ollama variant (0.1.32) the issue disappears. The problem is that ollama 0.1.32 does not support llama3.1 onwards. Would anyone happen to know how I could install a specific version of ollama?

baldy•6mo ago

runpod's instructions are more complicated than is necessary these days, imo. i just use ollama or openwebui images, so either ollama/ollama:0.x.y or ghcr.io/open-webui/open-webui:0.x.x-ollama (if you go with openwebui, you should open ports 11434 and 8080, otherwise 11434 is enough). it's real easy to get the specific version of ollama using ollama/ollama. it's more complicated with open-webui because you need to figure out what version of ollama was packaged with a version of open-webui. for what it's worth ghcr.io/open-webui/open-webui:0.3.35-ollama comes with ollama 0.3.14, on which i'm running llama 3.1 70b

acampOP•6mo ago

Thank you! I'll definitely give this a shot.

baldy•6mo ago

i'd also suggest one other change. create a disk (network ideally if you can) of some size and mount it in /root/.ollama so ollama data survives stops/starts.

baldy•6mo ago

aaand finally if you use openwebui -- openwebui stores data in /app/backend/data, but that's outside of /root/.ollama, so it won't survive a restart. so i have a start command that takes care of that. here's a screenshot of my setup

baldy•6mo ago

acampOP•6mo ago

@baldy Thanks for all the help. I was able to resolve the issue by utilizing an older docker image (0.3.14 instead of 0.4.1).

Madiator2011 (Work)•6mo ago

You can always use my Better Ollama template

Gaming

Programming

Ollama on Runpod

Did you find this page helpful?