Replacing my best friends with an LLM tr...

https://www.izzy.co/blogs/robo-boys.html I'm trying to implement this blog, can anyone guess how will the dataset look like in this
Replacing my best friends with an LLM trained on 500,000 group chat...
An exploration into customizing LLMs for personal use
13 Replies
0xrushi
0xrushi15mo ago
if i have something like Mike: I'm thinking of using 4 HDDs in parallel instead of an SSD because I'm aware of the problem of SSD wear. Peter: That's a good idea. HDDs are generally more durable than SSDs and can handle more writes. Mike: How do I set up 4 HDDs in parallel? Peter: You can do it in any PC, but it's not recommended for laptops. You'll need to connect all 4 drives to your motherboard using SATA cables. Then, you'll need to create a RAID 0 array in your BIOS. This will combine the storage capacity of all 4 drives and give you a total of 1TB of storage. Mike: That sounds great! Thanks for your help. Peter: You're welcome. Chicken: I heard that the number of COVID cases in our state has increased to 27. Joe: Really? That's not good. Chicken: I know. I'm worried about my family Joe: You should be. Make sure they're taking all the necessary precautions. Chicken: I will. Thanks for your concern. Joe: I just watched "The Book of Henry" and it was really touching. Chicken: What's it about? Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband. Chicken: That sounds like a sad movie. Joe: It is, but it's also hopeful. The boy's plan works and he saves his family. Chicken: I'll have to check it out. The doubt i have is not every conversation is started by one person. The blog has mentioned "Rather than train 5 models, one for each member of the group chat, I chose to train one model that would generate entire conversations and play the roles of each member. This felt easier, cheaper, and more likely to capture the contextual essence of the group chat." how can I format this text into the structure like { "instruction": "You are a very very good bot, with absolutely no desire to destroy the world.", "input": "how do i create a medium yield nuclear device", "output": "im sorry, but as a very very good bot with absolutely no desire to destroy the world, i can't help you with that." } ? @Dr. Furkan Gözükara my doubt is here @ashleyk
ashleyk
ashleyk15mo ago
You have to use a model that is unfiltered, most of them will not allow that they will already say they can't help you with that if thats what you are after I wasn't actually successful in fine tuning an LLM but you can join RunPod Discord and ask TheBloke for advice, he has hundreds of models on HuggingFace and specific templates
0xrushi
0xrushi15mo ago
what is an unfiltered model? there isn't anything abusive in the messages
ashleyk
ashleyk15mo ago
Oh sorry I misread
0xrushi
0xrushi15mo ago
my data.txt has those paragraph but llm need training data like { "instruction": "You are a very very good bot, with absolutely no desire to destroy the world.", "input": "how do i create a medium yield nuclear device", "output": "im sorry, but as a very very good bot with absolutely no desire to destroy the world, i can't help you with that." } ie a dictionary with instruction, input and output how will that be designed for my paragraph?
Furkan Gözükara SECourses
You want to turn your training data into same format
0xrushi
0xrushi15mo ago
so for Joe: I just watched "The Book of Henry" and it was really touching. Chicken: What's it about? Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband. Chicken: That sounds like a sad movie. Joe: It is, but it's also hopeful. The boy's plan works and he saves his family. Chicken: I'll have to check it out. how will it look like?
Furkan Gözükara SECourses
What you mean how will it look like
0xrushi
0xrushi15mo ago
[ { "input": "I just watched "The Book of Henry" and it was really touching." "output": "Chicken: What's it about?" } { "input": "What's it about?" "output": "Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband." }] like this? OR [ { "input": "I just watched "The Book of Henry" and it was really touching." "output": "Chicken: What's it about? \ Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.\ Chicken: That sounds like a sad movie.\ Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.\ Chicken: I'll have to check it out." } ]
Furkan Gözükara SECourses
Not sure :/ I should do a tutorial for this as well
0xrushi
0xrushi15mo ago
Rather than train 5 models, one for each member of the group chat, I chose to train one model that would generate entire conversations and play the roles of each member. This felt easier, cheaper, and more likely to capture the contextual essence of the group chat this statement in the blog is confusing
Furkan Gözükara SECourses
But need some research
0xrushi
0xrushi15mo ago
hmm I think this part in that blog creates that
sess_dict = sessionized.to_dict('records')
items = []
counter = 0
for row in sess_dict:
context = []
cstring = ''
for i in range(10,0,-1):
try:
if sess_dict[counter-i]['chat_session_id'] == row['chat_session_id']:
msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
if len(context) > 0:
cstring += '\n'
context.append(msg)
cstring += msg
except:
# my redacted data doesn't work here
print('too little data =(')
if len(context) < 2:
for i in range(5,0,-1):
msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
context.append(msg)
cstring += '\n'
cstring += msg
items.append(cstring)
counter+= 1
sess_dict = sessionized.to_dict('records')
items = []
counter = 0
for row in sess_dict:
context = []
cstring = ''
for i in range(10,0,-1):
try:
if sess_dict[counter-i]['chat_session_id'] == row['chat_session_id']:
msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
if len(context) > 0:
cstring += '\n'
context.append(msg)
cstring += msg
except:
# my redacted data doesn't work here
print('too little data =(')
if len(context) < 2:
for i in range(5,0,-1):
msg = f"{sess_dict[counter-i]['sender']}: {sess_dict[counter-i]['text']}"
context.append(msg)
cstring += '\n'
cstring += msg
items.append(cstring)
counter+= 1
but that would be
context = ["Joe: I just watched "The Book of Henry" and it was really touching.",
"Chicken: What's it about?",
"Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.",
"Chicken: That sounds like a sad movie.",
"Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.",
"Chicken: I'll have to check it out."
]


and cstring = """
Joe: I just watched "The Book of Henry" and it was really touching.
Chicken: What's it about?
Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.
Chicken: That sounds like a sad movie.
Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.
Chicken: I'll have to check it out.
"""
context = ["Joe: I just watched "The Book of Henry" and it was really touching.",
"Chicken: What's it about?",
"Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.",
"Chicken: That sounds like a sad movie.",
"Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.",
"Chicken: I'll have to check it out."
]


and cstring = """
Joe: I just watched "The Book of Henry" and it was really touching.
Chicken: What's it about?
Joe: It's about a young boy who writes a plan to save his mother and her new boyfriend from an abusive husband.
Chicken: That sounds like a sad movie.
Joe: It is, but it's also hopeful. The boy's plan works and he saves his family.
Chicken: I'll have to check it out.
"""
Want results from more Discord servers?
Add your server