Llama 3.1 via Ollama

You can now use the tutorial on running Ollama on serverless environments (https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference) in combination with Llama 3.1. We have tested this with Llama 3.1 8B, using a network volume and a 24 GB GPU PRO. Please let us know if this setup also works with other weights and GPUs.
Run an Ollama Server on a RunPod CPU | RunPod Documentation
Learn to set up and run an Ollama server on RunPod CPU for inference with this step-by-step tutorial.
No description
15 Replies
PatrickR
PatrickR3mo ago
Docs on that Docker image are now updated. Thanks for the ping!
NERDDISCO
NERDDISCO3mo ago
@PatrickR thank you very much!
Madiator2011
Madiator20113mo ago
#Better Ollama - CUDA12 works with gpu
aurelium
aurelium3mo ago
When you say "In the Container Start Command field, specify the Ollama supported model", do you mean literally just pasting the ollama model ID into that field?
PatrickR
PatrickR3mo ago
Yes. Like orca-mini or llama3.1 Also, the Docker image just updated to version 0.9 pooyaharatian/runpod-ollama:0.0.9
aurelium
aurelium3mo ago
I keep getting JSON decoding errors trying to run queries on it...
PatrickR
PatrickR3mo ago
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
Are you passing this?
No description
aurelium
aurelium3mo ago
Yeah:
{
"delayTime": 117699,
"error": "{\"error_type\": \"<class 'requests.exceptions.JSONDecodeError'>\", \"error_message\": \"Extra data: line 1 column 5 (char 4)\", \"error_traceback\": \"Traceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 974, in json\\n return complexjson.loads(self.text, **kwargs)\\n File \\\"/usr/lib/python3.10/json/__init__.py\\\", line 346, in loads\\n return _default_decoder.decode(s)\\n File \\\"/usr/lib/python3.10/json/decoder.py\\\", line 340, in decode\\n raise JSONDecodeError(\\\"Extra data\\\", s, end)\\njson.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\\", line 134, in run_job\\n handler_return = handler(job)\\n File \\\"//runpod_wrapper.py\\\", line 39, in handler\\n return response.json()\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 978, in json\\n raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)\\nrequests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\", \"hostname\": \"ogp9bh9fndvgck-64411159\", \"worker_id\": \"ogp9bh9fndvgck\", \"runpod_version\": \"1.6.2\"}",
"executionTime": 61,
"id": "c4794910-58f5-4179-98a9-0b0779ba0749-u1",
"status": "FAILED"
}
{
"delayTime": 117699,
"error": "{\"error_type\": \"<class 'requests.exceptions.JSONDecodeError'>\", \"error_message\": \"Extra data: line 1 column 5 (char 4)\", \"error_traceback\": \"Traceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 974, in json\\n return complexjson.loads(self.text, **kwargs)\\n File \\\"/usr/lib/python3.10/json/__init__.py\\\", line 346, in loads\\n return _default_decoder.decode(s)\\n File \\\"/usr/lib/python3.10/json/decoder.py\\\", line 340, in decode\\n raise JSONDecodeError(\\\"Extra data\\\", s, end)\\njson.decoder.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\\", line 134, in run_job\\n handler_return = handler(job)\\n File \\\"//runpod_wrapper.py\\\", line 39, in handler\\n return response.json()\\n File \\\"/usr/local/lib/python3.10/dist-packages/requests/models.py\\\", line 978, in json\\n raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)\\nrequests.exceptions.JSONDecodeError: Extra data: line 1 column 5 (char 4)\\n\", \"hostname\": \"ogp9bh9fndvgck-64411159\", \"worker_id\": \"ogp9bh9fndvgck\", \"runpod_version\": \"1.6.2\"}",
"executionTime": 61,
"id": "c4794910-58f5-4179-98a9-0b0779ba0749-u1",
"status": "FAILED"
}
request:
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
{
"input": {
"method_name": "generate",
"input": {
"prompt": "why the sky is blue?"
}
}
}
PatrickR
PatrickR3mo ago
downgrade the docker image to 0.0.7
NERDDISCO
NERDDISCO3mo ago
I also see this error for 0.0.9, so please use 0.0.8, as that one is working. I opened https://github.com/pooyahrtn/RunpodOllama/issues/11 to get this fixed.
GitHub
0.0.9 is broken · Issue #11 · pooyahrtn/RunpodOllama
When using the 0.0.9 of this image, we receive this error: { "delayTime": 14006, "error": "{"error_type": "<class 'requests.exceptions.JSONDecodeEr...
NERDDISCO
NERDDISCO3mo ago
Yes, like this:
No description
aurelium
aurelium3mo ago
That works, thanks!
NERDDISCO
NERDDISCO3mo ago
Perfect, have fun! Would you mind updating the version in the tutorial as well? And go back to 0.0.8? 🙏
PatrickR
PatrickR3mo ago
Reverted in the docs
NERDDISCO
NERDDISCO3mo ago
Thank you very much!
Want results from more Discord servers?
Add your server