Llama 3.1 via Ollama

You can now use the tutorial on running Ollama on serverless environments (https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference) in combination with Llama 3.1. We have tested this with Llama 3.1 8B, using a network volume and a 24 GB GPU PRO. Please let us know if this setup also works with other weights and GPUs.
Run an Ollama Server on a RunPod CPU | RunPod Documentation
Learn to set up and run an Ollama server on RunPod CPU for inference with this step-by-step tutorial.
No description
15 Replies
PatrickR
PatrickR4mo ago
Docs on that Docker image are now updated. Thanks for the ping!
NERDDISCO
NERDDISCOOP4mo ago
@PatrickR thank you very much!
Madiator2011
Madiator20114mo ago
#Better Ollama - CUDA12 works with gpu
aurelium
aurelium4mo ago
When you say "In the Container Start Command field, specify the Ollama supported model", do you mean literally just pasting the ollama model ID into that field?
PatrickR
PatrickR4mo ago
Yes. Like orca-mini or llama3.1 Also, the Docker image just updated to version 0.9 pooyaharatian/runpod-ollama:0.0.9
aurelium
aurelium4mo ago
I keep getting JSON decoding errors trying to run queries on it...
PatrickR
PatrickR4mo ago
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
Are you passing this?
No description
aurelium
aurelium4mo ago
Yeah:
{
"delayTime": 117699,
"error": "{\"error_type\": \"<class 'requests.exceptions.JSONDecodeError'>\", \"error_message\": \"Extra data: line 1 column 5 (char 4)\", \"error_traceback\": \"Traceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 974, in json\\n return complexjson.loads(self.text, **kwargs)\\n File \\\"/usr/lib/python3.10/json/__init__.py\\\", line 346, in loads\\n return _default_decoder.decode(s)\\n File \\\"/usr/lib/python3.10/json/decoder.py\\\", line 340, in decode\\n raise JSONDecodeError(\\\"Extra data\\\", s, end)\\njson.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\\", line 134, in run_job\\n handler_return = handler(job)\\n File \\\"//runpod_wrapper.py\\\", line 39, in handler\\n return response.json()\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 978, in json\\n raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)\\nrequests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\", \"hostname\": \"ogp9bh9fndvgck-64411159\", \"worker_id\": \"ogp9bh9fndvgck\", \"runpod_version\": \"1.6.2\"}",
"executionTime": 61,
"id": "c4794910-58f5-4179-98a9-0b0779ba0749-u1",
"status": "FAILED"
}
{
"delayTime": 117699,
"error": "{\"error_type\": \"<class 'requests.exceptions.JSONDecodeError'>\", \"error_message\": \"Extra data: line 1 column 5 (char 4)\", \"error_traceback\": \"Traceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 974, in json\\n return complexjson.loads(self.text, **kwargs)\\n File \\\"/usr/lib/python3.10/json/__init__.py\\\", line 346, in loads\\n return _default_decoder.decode(s)\\n File \\\"/usr/lib/python3.10/json/decoder.py\\\", line 340, in decode\\n raise JSONDecodeError(\\\"Extra data\\\", s, end)\\njson.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\\", line 134, in run_job\\n handler_return = handler(job)\\n File \\\"//runpod_wrapper.py\\\", line 39, in handler\\n return response.json()\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 978, in json\\n raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)\\nrequests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\", \"hostname\": \"ogp9bh9fndvgck-64411159\", \"worker_id\": \"ogp9bh9fndvgck\", \"runpod_version\": \"1.6.2\"}",
"executionTime": 61,
"id": "c4794910-58f5-4179-98a9-0b0779ba0749-u1",
"status": "FAILED"
}
request:
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
PatrickR
PatrickR4mo ago
downgrade the docker image to 0.0.7
NERDDISCO
NERDDISCOOP4mo ago
I also see this error for 0.0.9, so please use 0.0.8, as that one is working. I opened https://github.com/pooyahrtn/RunpodOllama/issues/11 to get this fixed.
GitHub
0.0.9 is broken · Issue #11 · pooyahrtn/RunpodOllama
When using the 0.0.9 of this image, we receive this error: { "delayTime": 14006, "error": "{"error_type": "<class 'requests.exceptions.JSONDecodeEr...
NERDDISCO
NERDDISCOOP4mo ago
Yes, like this:
No description
aurelium
aurelium4mo ago
That works, thanks!
NERDDISCO
NERDDISCOOP4mo ago
Perfect, have fun! Would you mind updating the version in the tutorial as well? And go back to 0.0.8? 🙏
PatrickR
PatrickR4mo ago
Reverted in the docs
NERDDISCO
NERDDISCOOP4mo ago
Thank you very much!
Want results from more Discord servers?
Add your server