R
RunPod•9mo ago
mfeldstein67

Problems with larger models

I'm having trouble consistently running models larger than 70b parameters in webui. They only work maybe one in ten times. When I do get them to work, even if I keep the pod, put it to sleep, and spin it up again later, I get error messages. Here's an example of error messages I'm getting from trying to load a model that I have successfully loaded before using the exact same configuration: Traceback (most recent call last): File "/workspace/text-generation-webui/modules/ui_model_menu.py", line 214, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) File "/workspace/text-generation-webui/modules/models.py", line 90, in load_model output = load_func_maploader File "/workspace/text-generation-webui/modules/models.py", line 399, in ExLlama_HF_loader return ExllamaHF.from_pretrained(model_name) File "/workspace/text-generation-webui/modules/exllama_hf.py", line 174, in from_pretrained return ExllamaHF(config) File "/workspace/text-generation-webui/modules/exllama_hf.py", line 31, in init self.ex_model = ExLlama(self.ex_config) File "/usr/local/lib/python3.10/dist-packages/exllama/model.py", line 852, in init self.embed_tokens.weight = nn.Parameter(tensors["model.embed_tokens.weight"]) KeyError: 'model.embed_tokens.weight'
49 Replies
ashleyk
ashleyk•9mo ago
This sounds like an issue with oobabooga not with RunPod, I suggest logging an issue on the oobabooga Github repo.
mfeldstein67
mfeldstein67•9mo ago
Unfortunately, the ooba channel is populated mostly with hobbyists who are running models locally and uninterested in (or unable to) help non-technical folks who are focused on learning about the inferencing output rather than tinkering with command lines. Even so, if this is an ooba problem rather than a RunPod problem, at least I know more than I did. But nobody on that board seems to recognize the problem. Which makes me think that either they just tinker in ways that I can't and don't think about it as a problem or there's some sort of ooba/RunPod interaction issue
ashleyk
ashleyk•9mo ago
There isn't an ooba/RunPod issue, there are hundreds of people who use ooba to interface with LLMs on RunPod successfully, so you probably either used the wrong model loader, used a GPU that doesn't have sufficient VRAM to load such a large model, or maybe even ran out of disk space, but you have given basically zero context so its impossible to determine what the issue is.
"I'm having trouble consistently running models larger than 70b parameters in webui." is completely and utterly useless to anyone. If you can provide the actual model that you are using, it would help but this information is like trying to find a needle in a haystack it is so completely useless. You also don't mention the GPU type you are running or anything useful at all. People can't read your mind. Be specific, then maybe you will have better luck in someone actually being able to assist you.
mfeldstein67
mfeldstein67•9mo ago
Um. I just provided an error log. Do you actually work for RunPod? I need to decide whether to complain to the company or just block you. This is a reproducable problem on identical configurations.
ashleyk
ashleyk•9mo ago
I don't work for RunPod so block me due to your own stupidity đź‘Ť
mfeldstein67
mfeldstein67•9mo ago
I've fired a lot of guys like you.
ashleyk
ashleyk•9mo ago
Its NOT possible to reproduce with ZERO INFORMATION. Sure you are probably a 13 year old teenager
mfeldstein67
mfeldstein67•9mo ago
No, but I used to teach 13 year old teenagers, and you sure sound like one. Blocking now. Buh-bye.
haris
haris•9mo ago
Hi @mfeldstein67, so sorry for the trouble, I'll see if I can have a member of our team look at this support thread in hopes they can properly assist you!
Madiator2011
Madiator2011•9mo ago
Without more informations cant help much either
mfeldstein67
mfeldstein67•9mo ago
@Polar Thanks. I'm non-engineer trying to learn about inferencing capabilities of mid-sized models, so I don't know what information you need. If you or @Madiator2011 or anybody else needs more information, please tell me what you need. I usually don't do this sort of thing without the help of an engineer but I'm on my own at the moment.
Madiator2011
Madiator2011•9mo ago
what kinda of model you try to load?
mfeldstein67
mfeldstein67•9mo ago
@Madiator2011 There aren't many models in the 100b - 120b size that look legit. The two I've spent the most time trying to get working are Goliath and Rogue-Rose 103b. In the latter case, I've tried both GPTQ and AWQ formats. I was able to get Rogue-Rose to load once. But when I put the machine to sleep and woke it back up again—no other changes—it threw errors again. Which sounds like a DevOps problem to me. It's hard for user error to interfere with pressing the "stop" and "play" buttons. @Madiator2011 I always use The Bloke's quantizations because he has such a good reputation.
Madiator2011
Madiator2011•9mo ago
Like I said without full error message cant help
mfeldstein67
mfeldstein67•9mo ago
Actually, that's not what you said; you didn't specify what information you need. But nevermind. If you read from the top of the thread, you'll see I posted the error message from trying to load Rogue-Rose 103b GPTQ.
Want results from more Discord servers?
Add your server