R
RunPod10mo ago
danlas

Api to Text Generation Web UI

Hello! I want to upload a model using a serverless pod. It will be a Text Generation Web UI. I know I will end up getting an endpoint. I would like to create a few "characters" on there but how can I interact with the model using my code? Is there some type of POST endpoint I need to use to use a certain character and run my model? I am currently using OpenAI GPT4 for this but would like to switch over. Thanks!!
24 Replies
justin
justin10mo ago
By text generation web ui? are you looking to host a web ui interface? or are you looking to use some sort of LLM model, like an API this just will highly depend on the model you are using / usually most LLM models will have some sort of example python code on how to interface with the model Like if u can do it in python, you essentially can write. your handler to do it:
def handler(job):
model = load model
model.predict(job.input.something)
def handler(job):
model = load model
model.predict(job.input.something)
danlas
danlasOP10mo ago
I have a model I want to use, on my phone right now but it’s TheBloke 2.5, I think the latest one. Right now I am using the web ui to load it up, and I created characters to use them with. My end goal is to just host the LLM and invoke it with chat history + a certain context (character) Is there a certain type of template I should use for runpod? @justin I never hosted a LLM before so this all new to me
justin
justin10mo ago
I see. There is no certain "template". For runpod as long as you can run it in a docker container it will run. The hard thing what your asking I think just requires more development knowledge, and I guess that depends on you It is certainly possible to host an LLM But to have context history etc. will require more development time I also think would require a lot of experimentation, personally i am not as familiar with LLMs, but just reading through their github repository
justin
justin10mo ago
GitHub
GitHub - TheBlokeAI/dockerLLM: TheBloke's Dockerfiles
TheBloke's Dockerfiles. Contribute to TheBlokeAI/dockerLLM development by creating an account on GitHub.
justin
justin10mo ago
but i think ur prob looking more for like
justin
justin10mo ago
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - GitHub - runpod-workers/worker-vllm: The RunPod worker template for serving our large language model en...
justin
justin10mo ago
But ik ppl in the server have had trouble with the worker vllm If ur okay with slower response times
justin
justin10mo ago
Generative labs have great videos to do exactly what u want to host ur own llm
danlas
danlasOP10mo ago
Ok thanks! I’ll take a peep at it tomorrow. Maybe I gotta create my own api endpoint inside the pod then use my backend code to hit it (my guess) Awesome thanks
justin
justin10mo ago
Yeah, the only thing is when using network volumes like generative lab can be slower response Its not really inside the pod So they have: GPU Pod, which is a running linux computer till u turn it off, u can think of it that way then serverless, which essentially you define a function in python and when the endpoint is pinged, will turn on ur docker in the same environment as the gpu pod to serve the request, but when the function is done, will turn it off 🙂 So serverless is a bit different than just exposing a port on a persistent pod till u turn it off
danlas
danlasOP10mo ago
Yeah I was going toward the serverless to save on costs My main thing was like how can my current backend app hit this new LLM I’m hosting and pass in context/history. Currently using GPT4 and Langchain for it But I’ll take a look at the docs you sent tomorrow
justin
justin10mo ago
well its really just appending it haha that is essentially what history is all LLMs are stateless so any history is cached on ur backend and when u make a request ur essentially appending it or if the appending is too long u use the LLM to summarize it and apppend the summarization to the incoming new query Ill also argue that if chatgpt is too expensive
justin
justin10mo ago
Mistral is a great alternative You can certainly host ur own LLMs but ive never found it worth it compared to worrying about cold starts / worrying about the technical difficulties etc Its why i tend to use runpod more for other ML models, but i think services for LLMs u can probably go down more routes that are cheaper that are already managed for u before turning to manage it urself unless u have a specific use case like privacy concerns, so on
danlas
danlasOP10mo ago
That looks like a good site maybe I can just use the api
justin
justin10mo ago
Yup!
danlas
danlasOP10mo ago
Saves me from having to run my own
justin
justin10mo ago
Runpod has their own API for Llama models hosted: https://doc.runpod.io/reference/llama2-13b-chat An interesting idea ive heard is that
RunPod
Llama2 13B Chat
Retrieve Results & StatusNote: For information on how to check job status and retrieve results, please refer to our Status Endpoint Documentation.Streaming Token Outputs Make a POST request to the /llama2-13b-chat/run API endpoint.Retrieve the job ID.Make a GET request to /llama2-13b-chat/stream...
justin
justin10mo ago
(personally i think llama2 isn't that good without fine tuning) but u can use if runpod is cheaper for summarizations and use mistral / chatgpt for more complex stuff But yeah
danlas
danlasOP10mo ago
I’m trying to make a site like https://candy.ai and their model is really good, GPT4 is very censored and doesn’t work ,
Candy.ai - Enjoy The Ultimate AI Girlfriend Experience
Candy.ai - Enjoy The Ultimate AI Girlfriend Experience
Engage with Candy.ai's virtual companions for immersive and personalized chats. Dive deep into intricate dialogues, and liberate your imagination. Experience adaptive AI-driven role-plays today.
justin
justin10mo ago
I see I think mistral is ur best bet 🙂 I dont think they censor.. i could be wriong But I definitely think that is more worth a shot first
danlas
danlasOP10mo ago
As long as the api can specify context and chat history then I’ll be good! Thanks man!
JJonahJ
JJonahJ10mo ago
https://hub.docker.com/repository/docker/toxibunny/mixtral-8x7b-moe-rp-story-awq/general https://hub.docker.com/repository/docker/toxibunny/rpmacaronimaidapi/general Here are a couple of ‘uncensored’ RP focused models, ready for use with the runpod llama2 endpoint. Tested a little and seem to be working so far. It is just kinda like openai API though where you have to handle context/history yourself… Edit: though I forgot about the assistants API that handles all that for you 😅 It’s been a while…
Want results from more Discord servers?
Add your server