R
RunPod2w ago
acamp

Ollama on Runpod

After following all instructions in the following article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20 I am able to setup a Ollama on a pod, however after a few inferences, I get a 504 (sometimes 524) error in response. I have been making inferences to Ollama on a Runpod pod for the past few months now, and never faced this issue, so it's definitely more recent. Any thought on what might be going on?
Set up Ollama on your GPU Pod | RunPod Documentation
Learn how to set up Ollama, a powerful language model, on a GPU Pod using RunPod, and interact with it through HTTP API requests, allowing you to harness the power of GPU acceleration for your AI projects.
9 Replies
baldy
baldy2w ago
i think 524 is an error generated by cloudflare when the connection times out -- are you requesting streamed responses? if not, i wonder if the response is just taking too long and maybe the proxy is cutting off the connection due to inactivity i'm also using ollama in some runpods and haven't had too many problems, although i am using streaming (for the most part)
acamp
acampOP2w ago
I have not been using streamed responses. After a bit of explorng, I think the issue seems to lie with the ollama version. The download link (presented in the article) installs ollama version 0.4.1, however, when I used an older ollama variant (0.1.32) the issue disappears. The problem is that ollama 0.1.32 does not support llama3.1 onwards. Would anyone happen to know how I could install a specific version of ollama?
baldy
baldy2w ago
runpod's instructions are more complicated than is necessary these days, imo. i just use ollama or openwebui images, so either ollama/ollama:0.x.y or ghcr.io/open-webui/open-webui:0.x.x-ollama (if you go with openwebui, you should open ports 11434 and 8080, otherwise 11434 is enough). it's real easy to get the specific version of ollama using ollama/ollama. it's more complicated with open-webui because you need to figure out what version of ollama was packaged with a version of open-webui. for what it's worth ghcr.io/open-webui/open-webui:0.3.35-ollama comes with ollama 0.3.14, on which i'm running llama 3.1 70b
acamp
acampOP2w ago
Thank you! I'll definitely give this a shot.
baldy
baldy2w ago
i'd also suggest one other change. create a disk (network ideally if you can) of some size and mount it in /root/.ollama so ollama data survives stops/starts.
baldy
baldy2w ago
aaand finally if you use openwebui -- openwebui stores data in /app/backend/data, but that's outside of /root/.ollama, so it won't survive a restart. so i have a start command that takes care of that. here's a screenshot of my setup
No description
baldy
baldy2w ago
gl
acamp
acampOP3d ago
@baldy Thanks for all the help. I was able to resolve the issue by utilizing an older docker image (0.3.14 instead of 0.4.1).
Madiator2011 (Work)
You can always use my Better Ollama template
Want results from more Discord servers?
Add your server