Ollama on Runpod
After following all instructions in the following article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20
I am able to setup a Ollama on a pod, however after a few inferences, I get a 504 (sometimes 524) error in response. I have been making inferences to Ollama on a Runpod pod for the past few months now, and never faced this issue, so it's definitely more recent. Any thought on what might be going on?
Set up Ollama on your GPU Pod | RunPod Documentation
Learn how to set up Ollama, a powerful language model, on a GPU Pod using RunPod, and interact with it through HTTP API requests, allowing you to harness the power of GPU acceleration for your AI projects.
9 Replies
i think 524 is an error generated by cloudflare when the connection times out -- are you requesting streamed responses? if not, i wonder if the response is just taking too long and maybe the proxy is cutting off the connection due to inactivity
i'm also using ollama in some runpods and haven't had too many problems, although i am using streaming (for the most part)
I have not been using streamed responses.
After a bit of explorng, I think the issue seems to lie with the ollama version. The download link (presented in the article) installs ollama version 0.4.1, however, when I used an older ollama variant (0.1.32) the issue disappears. The problem is that ollama 0.1.32 does not support llama3.1 onwards. Would anyone happen to know how I could install a specific version of ollama?
runpod's instructions are more complicated than is necessary these days, imo. i just use ollama or openwebui images, so either
ollama/ollama:0.x.y
or ghcr.io/open-webui/open-webui:0.x.x-ollama
(if you go with openwebui, you should open ports 11434 and 8080, otherwise 11434 is enough).
it's real easy to get the specific version of ollama using ollama/ollama
. it's more complicated with open-webui because you need to figure out what version of ollama was packaged with a version of open-webui.
for what it's worth ghcr.io/open-webui/open-webui:0.3.35-ollama
comes with ollama 0.3.14
, on which i'm running llama 3.1 70bThank you! I'll definitely give this a shot.
i'd also suggest one other change. create a disk (network ideally if you can) of some size and mount it in
/root/.ollama
so ollama data survives stops/starts.aaand finally if you use openwebui -- openwebui stores data in /app/backend/data, but that's outside of /root/.ollama, so it won't survive a restart. so i have a start command that takes care of that. here's a screenshot of my setup
gl
@baldy Thanks for all the help. I was able to resolve the issue by utilizing an older docker image (0.3.14 instead of 0.4.1).
You can always use my Better Ollama template