Ollama API
Hello, I am trying to host LLMs on runpod gpu-cloud using Ollama (https://ollama.com/download). I want to set it up as an endpoint so I can access it from my local laptop, using Python libraries like Langchain. I'm having trouble setting up the API endpoint, anyone worked with this before?
11 Replies
Yeah just run the install script they tell u to, then u can do ollama serve
in one terminal
and ollama run (model name) in another
How can I make api calls to it though? is there a template that exposes it on a port
You have to create your own, you can start with the PyTorch template as a base
Do you know how I could expose it to an IP and port? I'm stuck with that part
Either add HTTP port to your pod or ensure your pod has a public IP and add a TCP port and then use the public IP and public port mapping under the Connect button.
I see okay, do you know the commands to add this ip to Ollama?
Expose ports | RunPod Documentation
Learn to expose your ports.
Follow this tutorial
https://discord.com/channels/912829806415085598/1207214335605088266
launch a flask app
process ur incoming api requests through flasks
Actually Im guessing Ollama launches a backend locally
so wherever that port is, you can just bind to that port directly
and then send the API requests there since Ollama supports API requests
Thank you so much, I'll have a look 😄
@manan4884 https://discord.com/channels/912829806415085598/1207848538629742623
Enough people seem to be getting into Ollama so i wrote a simple setup instructions
on binding it etc
Thanks alot for the help!