Solara•2y ago

Llama2 Chatbot

I'll create a thread here with the source so it doesn't clutter the chat

13 Replies

from dataclasses import dataclass

import solara
from llama_cpp import Llama, ChatCompletionRequestMessage

@solara.component
def Page():
    history = solara.use_reactive([SYSTEM])
    user_text = solara.use_reactive("")
    assistant_stream = solara.use_reactive("")

    def chat():
        print(user_text.value)
        if user_text.value != "":
            chat_history = list(history.value)
            chat_history.append({"role": "user", "content": user_text.value})
            assert isinstance(history.value, list)
            output = LLM.create_chat_completion(chat_history, stream=True)

            for item in output:
                assistant_stream.value = item["choices"][0]["text"]

            chat_history.append(assistant_stream.value)

            user_text.value = ""
            history.value = chat_history

    print(user_text.value)
    solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])

    with solara.Column():
        for value in history.value:
            if value["role"] == "system":
                continue

            if value["role"] == "user":
                with solara.Card(style={"background": "#555555"}):
                    solara.Markdown(value["content"])

            if value["role"] == "assistant":
                with solara.Card(style={"background": "#444444"}):
                    solara.Markdown(value["content"])

        with solara.Card(style={"background": "#666666"}):
            solara.InputText(
                "Ask a question! (hit enter to submit)",
                value=user_text.value,
                on_value=user_text.set,
                disabled=user_text.value != "",
            )

        if user_text.value != "":
            solara.ProgressLinear(True)

            with solara.Card(style={"background": "#444444"}):
                solara.Markdown(assistant_stream.value)

from dataclasses import dataclass

import solara
from llama_cpp import Llama, ChatCompletionRequestMessage

@solara.component
def Page():
    history = solara.use_reactive([SYSTEM])
    user_text = solara.use_reactive("")
    assistant_stream = solara.use_reactive("")

    def chat():
        print(user_text.value)
        if user_text.value != "":
            chat_history = list(history.value)
            chat_history.append({"role": "user", "content": user_text.value})
            assert isinstance(history.value, list)
            output = LLM.create_chat_completion(chat_history, stream=True)

            for item in output:
                assistant_stream.value = item["choices"][0]["text"]

            chat_history.append(assistant_stream.value)

            user_text.value = ""
            history.value = chat_history

    print(user_text.value)
    solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])

    with solara.Column():
        for value in history.value:
            if value["role"] == "system":
                continue

            if value["role"] == "user":
                with solara.Card(style={"background": "#555555"}):
                    solara.Markdown(value["content"])

            if value["role"] == "assistant":
                with solara.Card(style={"background": "#444444"}):
                    solara.Markdown(value["content"])

        with solara.Card(style={"background": "#666666"}):
            solara.InputText(
                "Ask a question! (hit enter to submit)",
                value=user_text.value,
                on_value=user_text.set,
                disabled=user_text.value != "",
            )

        if user_text.value != "":
            solara.ProgressLinear(True)

            with solara.Card(style={"background": "#444444"}):
                solara.Markdown(assistant_stream.value)

I had to delete a few things about the model setup because the post was too long I can share those as well.

withnail•2y ago

do you have a github repo for the model setup?

Elder MillenialOP•2y ago

Not yet. The model setup isn't super complicated, but you do need to request a download key from Facebook

withnail•2y ago

sure just trying to reproduce locally. so it is outputting the llm result correct? you just want the text to populate as it is generated?

MaartenBreddels•2y ago

solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])
should be 
solara.use_thread(chat, dependencies=[user_text.value])
I think. because you want it to execute when the text changes (the reactive variable will not change)

solara.use_thread(chat, dependencies=[history, user_text, assistant_stream])
should be 
solara.use_thread(chat, dependencies=[user_text.value])
I think. because you want it to execute when the text changes (the reactive variable will not change)

Elder MillenialOP•2y ago

Correct. @MaartenBreddels fix worked. I'm happy to share my process. It's just involved to setup right now because this is very WIP. Basically you need to sign up to get access to llama 2, download a couple hundred gigs of models, convert them to another format, install a few libraries from git... It's just a mess right now. I think we could probably create an example using some kind of streamlined functionality. For example, we could replicate this example by streaming converted tokens with a delay. We could be able to modify the new AI example to achieve this. It would be a proof of concept on how to replicate the openai UI that does the same thing, without having to run an actual model.

MaartenBreddels•2y ago

why is there no delay right now?

Elder MillenialOP•2y ago

Ah, when I said delay, I meant add a small random delay to simulate the token generation speed of a large language model. It would be purely for visualization reasons.

MaartenBreddels•2y ago

ah, why don't you get a delay from the model by itself then, i expect the models to be slow, but it's not?

Elder MillenialOP•2y ago

I think we might be talking past each other a bit haha. What I'm trying to say is that the models are fairly difficult to setup and run easily. So trying to set one up for an easy to run example might not be so easy. We could show the ability to have "real time" streaming responses from an AI model by simulating the processing delays with a random sleep. It would just be to show the ability to create an updating text output.

Elder MillenialOP•2y ago

Just to close the loop on my previous issue, here's a video of the final (working) solution

MaartenBreddels•2y ago

Ah, now I understand! Yes, we could show the UI that way until it's configured correctly same with using openai, if you don't give a token, have some default reply, i like that idea are you planning to write an article on that?

Elder MillenialOP•2y ago

I'm not, but I'd be happy to provide an example and make a Tweet. I really don't like writing articles. I probably should do it more often.

Gaming

Programming

Llama2 Chatbot

Did you find this page helpful?