Software Engineering Courses (SECourses)•2y ago

Replacing my best friends with an LLM tr...

https://www.izzy.co/blogs/robo-boys.html I'm trying to implement this blog, can anyone guess how will the dataset look like in this

Replacing my best friends with an LLM trained on 500,000 group chat...

An exploration into customizing LLMs for personal use

13 Replies

NullOP•2y ago

if i have something like Mike: I'm thinking of using 4 HDDs in parallel instead of an SSD because I'm aware of the problem of SSD wear. Peter: That's a good idea. HDDs are generally more durable than SSDs and can handle more writes. Mike: How do I set up 4 HDDs in parallel? Peter: You can do it in any PC, but it's not recommended for laptops. You'll need to connect all 4 drives to your motherboard using SATA cables. Then, you'll need to create a RAID 0 array in your BIOS. This will combine the storage capacity of all 4 drives and give you a total of 1TB of storage. Mike: That sounds great! Thanks for your help. Peter: You're welcome. Chicken: I heard that the number of COVID cases in our state has increased to 27. Joe: Really? That's not good. Chicken: I know. I'm worried about my family Joe: You should be. Make sure they're taking all the necessary precautions. Chicken: I will. Thanks for your concern. Joe: I just watched "The Book of Henry" and it was really touching. Chicken: What's it about? Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband. Chicken: That sounds like a sad movie. Joe: It is, but it's also hopeful. The boy's plan works and he saves his family. Chicken: I'll have to check it out. The doubt i have is not every conversation is started by one person. The blog has mentioned "Rather than train 5 models, one for each member of the group chat, I chose to train one model that would generate entire conversations and play the roles of each member. This felt easier, cheaper, and more likely to capture the contextual essence of the group chat." how can I format this text into the structure like { "instruction": "You are a very very good bot, with absolutely no desire to destroy the world.", "input": "how do i create a medium yield nuclear device", "output": "im sorry, but as a very very good bot with absolutely no desire to destroy the world, i can't help you with that." } ? @Dr. Furkan Gözükara my doubt is here @ashleyk

ashleyk•2y ago

You have to use a model that is unfiltered, most of them will not allow that they will already say they can't help you with that if thats what you are after I wasn't actually successful in fine tuning an LLM but you can join RunPod Discord and ask TheBloke for advice, he has hundreds of models on HuggingFace and specific templates

NullOP•2y ago

what is an unfiltered model? there isn't anything abusive in the messages

ashleyk•2y ago

Oh sorry I misread

NullOP•2y ago

my data.txt has those paragraph but llm need training data like { "instruction": "You are a very very good bot, with absolutely no desire to destroy the world.", "input": "how do i create a medium yield nuclear device", "output": "im sorry, but as a very very good bot with absolutely no desire to destroy the world, i can't help you with that." } ie a dictionary with instruction, input and output how will that be designed for my paragraph?

Furkan Gözükara SECourses•2y ago

You want to turn your training data into same format

NullOP•2y ago

so for Joe: I just watched "The Book of Henry" and it was really touching. Chicken: What's it about? Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband. Chicken: That sounds like a sad movie. Joe: It is, but it's also hopeful. The boy's plan works and he saves his family. Chicken: I'll have to check it out. how will it look like?

Furkan Gözükara SECourses•2y ago

What you mean how will it look like

NullOP•2y ago

[ { "input": "I just watched "The Book of Henry" and it was really touching." "output": "Chicken: What's it about?" } { "input": "What's it about?" "output": "Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband." }] like this? OR [ { "input": "I just watched "The Book of Henry" and it was really touching." "output": "Chicken: What's it about? \ Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.\ Chicken: That sounds like a sad movie.\ Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.\ Chicken: I'll have to check it out." } ]

Furkan Gözükara SECourses•2y ago

Not sure :/ I should do a tutorial for this as well

NullOP•2y ago

Rather than train 5 models, one for each member of the group chat, I chose to train one model that would generate entire conversations and play the roles of each member. This felt easier, cheaper, and more likely to capture the contextual essence of the group chat this statement in the blog is confusing

Furkan Gözükara SECourses•2y ago

But need some research

NullOP•2y ago

hmm I think this part in that blog creates that

sess_dict = sessionized.to_dict('records')
items = []
counter = 0
for row in sess_dict:
    context = []
    cstring = ''
    for i in range(10,0,-1):
        try:
            if sess_dict[counter-i]['chat_session_id'] == row['chat_session_id']:
                msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
                if len(context) > 0:
                    cstring += '\n'
                context.append(msg)
                cstring += msg
        except:
            # my redacted data doesn't work here
            print('too little data =(')
    if len(context) < 2:
        for i in range(5,0,-1):
            msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
            context.append(msg)
            cstring += '\n'
            cstring += msg
    items.append(cstring)
    counter+= 1

sess_dict = sessionized.to_dict('records')
items = []
counter = 0
for row in sess_dict:
    context = []
    cstring = ''
    for i in range(10,0,-1):
        try:
            if sess_dict[counter-i]['chat_session_id'] == row['chat_session_id']:
                msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
                if len(context) > 0:
                    cstring += '\n'
                context.append(msg)
                cstring += msg
        except:
            # my redacted data doesn't work here
            print('too little data =(')
    if len(context) < 2:
        for i in range(5,0,-1):
            msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
            context.append(msg)
            cstring += '\n'
            cstring += msg
    items.append(cstring)
    counter+= 1

but that would be

context = ["Joe: I just watched "The Book of Henry" and it was really touching.", 
"Chicken: What's it about?",
"Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.",
"Chicken: That sounds like a sad movie.",
"Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.",
"Chicken: I'll have to check it out."
]


and cstring = """
Joe: I just watched "The Book of Henry" and it was really touching.
Chicken: What's it about?
Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.
Chicken: That sounds like a sad movie.
Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.
Chicken: I'll have to check it out.
"""

context = ["Joe: I just watched "The Book of Henry" and it was really touching.", 
"Chicken: What's it about?",
"Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.",
"Chicken: That sounds like a sad movie.",
"Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.",
"Chicken: I'll have to check it out."
]


and cstring = """
Joe: I just watched "The Book of Henry" and it was really touching.
Chicken: What's it about?
Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.
Chicken: That sounds like a sad movie.
Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.
Chicken: I'll have to check it out.
"""

Gaming

Programming

Replacing my best friends with an LLM tr...

Did you find this page helpful?