Api to Text Generation Web UI
Hello! I want to upload a model using a serverless pod. It will be a Text Generation Web UI. I know I will end up getting an endpoint.
I would like to create a few "characters" on there but how can I interact with the model using my code? Is there some type of POST endpoint I need to use to use a certain character and run my model?
I am currently using OpenAI GPT4 for this but would like to switch over.
Thanks!!
24 Replies
By text generation web ui? are you looking to host a web ui interface? or are you looking to use some sort of LLM model, like an API
this just will highly depend on the model you are using / usually most LLM models will have some sort of example python code on how to interface with the model
Like if u can do it in python, you essentially can write. your handler to do it:
I have a model I want to use, on my phone right now but it’s TheBloke 2.5, I think the latest one. Right now I am using the web ui to load it up, and I created characters to use them with.
My end goal is to just host the LLM and invoke it with chat history + a certain context (character)
Is there a certain type of template I should use for runpod? @justin
I never hosted a LLM before so this all new to me
I see. There is no certain "template". For runpod as long as you can run it in a docker container it will run. The hard thing what your asking I think just requires more development knowledge, and I guess that depends on you
It is certainly possible to host an LLM
But to have context history etc. will require more development time
I also think would require a lot of experimentation, personally i am not as familiar with LLMs, but just reading through their github repository
https://github.com/TheBlokeAI/dockerLLM/tree/main
It is possible...
GitHub
GitHub - TheBlokeAI/dockerLLM: TheBloke's Dockerfiles
TheBloke's Dockerfiles. Contribute to TheBlokeAI/dockerLLM development by creating an account on GitHub.
but i think ur prob looking more for like
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - GitHub - runpod-workers/worker-vllm: The RunPod worker template for serving our large language model en...
But ik ppl in the server have had trouble with the worker vllm
If ur okay
with slower response times
Generative labs have great videos
to do exactly what u want
to host ur own llm
Ok thanks! I’ll take a peep at it tomorrow. Maybe I gotta create my own api endpoint inside the pod then use my backend code to hit it (my guess)
Awesome thanks
Yeah, the only thing is when using network volumes like generative lab can be slower response
Its not really inside the pod
So they have:
GPU Pod, which is a running linux computer till u turn it off, u can think of it that way
then serverless, which essentially you define a function in python
and when the endpoint is pinged, will turn on ur docker in the same environment as the gpu pod to serve the request, but when the function is done, will turn it off 🙂
So serverless is a bit different than just exposing a port on a persistent pod
till u turn it off
Yeah I was going toward the serverless to save on costs
My main thing was like how can my current backend app hit this new LLM I’m hosting and pass in context/history. Currently using GPT4 and Langchain for it
But I’ll take a look at the docs you sent tomorrow
well its really just appending it haha
that is essentially what history is
all LLMs are stateless
so any history is cached on ur backend
and when u make a request
ur essentially appending it
or if the appending is too long
u use the LLM to summarize it
and apppend the summarization to the incoming new query
Ill also argue that if chatgpt is too expensive
Mistral AI | Open-weight models
Frontier AI in your hands
Mistral is a great alternative
You can certainly host ur own LLMs
but ive never found it worth it
compared to worrying about cold starts
/ worrying about the technical difficulties etc
Its why i tend to use runpod more for other ML models, but i think services for LLMs u can probably go down more routes that are cheaper that are already managed for u
before turning to manage it urself
unless u have a specific use case
like privacy concerns, so on
That looks like a good site maybe I can just use the api
Yup!
Saves me from having to run my own
Runpod has their own API for Llama models hosted:
https://doc.runpod.io/reference/llama2-13b-chat
An interesting idea ive heard is that
RunPod
Llama2 13B Chat
Retrieve Results & StatusNote: For information on how to check job status and retrieve results, please refer to our Status Endpoint Documentation.Streaming Token Outputs Make a POST request to the /llama2-13b-chat/run API endpoint.Retrieve the job ID.Make a GET request to /llama2-13b-chat/stream...
(personally i think llama2 isn't that good without fine tuning)
but u can use if runpod is cheaper
for summarizations
and use mistral / chatgpt for more complex stuff
But yeah
I’m trying to make a site like https://candy.ai and their model is really good, GPT4 is very censored and doesn’t work ,
Candy.ai - Enjoy The Ultimate AI Girlfriend Experience
Candy.ai - Enjoy The Ultimate AI Girlfriend Experience
Engage with Candy.ai's virtual companions for immersive and personalized chats. Dive deep into intricate dialogues, and liberate your imagination. Experience adaptive AI-driven role-plays today.
I see
I think mistral is ur best bet 🙂
I dont think they censor.. i could be wriong
But I definitely think that is more worth a shot first
As long as the api can specify context and chat history then I’ll be good!
Thanks man!
https://hub.docker.com/repository/docker/toxibunny/mixtral-8x7b-moe-rp-story-awq/general
https://hub.docker.com/repository/docker/toxibunny/rpmacaronimaidapi/general
Here are a couple of ‘uncensored’ RP focused models, ready for use with the runpod llama2 endpoint. Tested a little and seem to be working so far. It is just kinda like openai API though where you have to handle context/history yourself…
Edit: though I forgot about the assistants API that handles all that for you 😅 It’s been a while…