Llama 3.1 via Ollama
You can now use the tutorial on running Ollama on serverless environments (https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference) in combination with Llama 3.1.
We have tested this with Llama 3.1 8B, using a network volume and a 24 GB GPU PRO. Please let us know if this setup also works with other weights and GPUs.
Run an Ollama Server on a RunPod CPU | RunPod Documentation
Learn to set up and run an Ollama server on RunPod CPU for inference with this step-by-step tutorial.
15 Replies
Docs on that Docker image are now updated. Thanks for the ping!
@PatrickR thank you very much!
#Better Ollama - CUDA12 works with gpu
When you say "In the Container Start Command field, specify the Ollama supported model", do you mean literally just pasting the ollama model ID into that field?
Yes. Like
orca-mini
or llama3.1
Also, the Docker image just updated to version 0.9
pooyaharatian/runpod-ollama:0.0.9
I keep getting JSON decoding errors trying to run queries on it...
Are you passing this?
Yeah:
request:
downgrade the docker image to 0.0.7
I also see this error for
0.0.9
, so please use 0.0.8
, as that one is working.
I opened https://github.com/pooyahrtn/RunpodOllama/issues/11 to get this fixed.GitHub
0.0.9 is broken · Issue #11 · pooyahrtn/RunpodOllama
When using the 0.0.9 of this image, we receive this error: { "delayTime": 14006, "error": "{"error_type": "<class 'requests.exceptions.JSONDecodeEr...
Yes, like this:
That works, thanks!
Perfect, have fun!
Would you mind updating the version in the tutorial as well? And go back to
0.0.8
? 🙏Reverted in the docs
Thank you very much!