RunPod•16mo ago

Problems with larger models

I'm having trouble consistently running models larger than 70b parameters in webui. They only work maybe one in ten times. When I do get them to work, even if I keep the pod, put it to sleep, and spin it up again later, I get error messages. Here's an example of error messages I'm getting from trying to load a model that I have successfully loaded before using the exact same configuration: Traceback (most recent call last): File "/workspace/text-generation-webui/modules/ui_model_menu.py", line 214, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) File "/workspace/text-generation-webui/modules/models.py", line 90, in load_model output = load_func_maploader File "/workspace/text-generation-webui/modules/models.py", line 399, in ExLlama_HF_loader return ExllamaHF.from_pretrained(model_name) File "/workspace/text-generation-webui/modules/exllama_hf.py", line 174, in from_pretrained return ExllamaHF(config) File "/workspace/text-generation-webui/modules/exllama_hf.py", line 31, in init self.ex_model = ExLlama(self.ex_config) File "/usr/local/lib/python3.10/dist-packages/exllama/model.py", line 852, in init self.embed_tokens.weight = nn.Parameter(tensors["model.embed_tokens.weight"]) KeyError: 'model.embed_tokens.weight'

49 Replies

ashleyk•16mo ago

This sounds like an issue with oobabooga not with RunPod, I suggest logging an issue on the oobabooga Github repo.

mfeldstein67OP•16mo ago

Unfortunately, the ooba channel is populated mostly with hobbyists who are running models locally and uninterested in (or unable to) help non-technical folks who are focused on learning about the inferencing output rather than tinkering with command lines. Even so, if this is an ooba problem rather than a RunPod problem, at least I know more than I did. But nobody on that board seems to recognize the problem. Which makes me think that either they just tinker in ways that I can't and don't think about it as a problem or there's some sort of ooba/RunPod interaction issue

ashleyk•16mo ago

There isn't an ooba/RunPod issue, there are hundreds of people who use ooba to interface with LLMs on RunPod successfully, so you probably either used the wrong model loader, used a GPU that doesn't have sufficient VRAM to load such a large model, or maybe even ran out of disk space, but you have given basically zero context so its impossible to determine what the issue is.
"I'm having trouble consistently running models larger than 70b parameters in webui." is completely and utterly useless to anyone. If you can provide the actual model that you are using, it would help but this information is like trying to find a needle in a haystack it is so completely useless. You also don't mention the GPU type you are running or anything useful at all. People can't read your mind. Be specific, then maybe you will have better luck in someone actually being able to assist you.

mfeldstein67OP•16mo ago

Um. I just provided an error log. Do you actually work for RunPod? I need to decide whether to complain to the company or just block you. This is a reproducable problem on identical configurations.

ashleyk•16mo ago

I don't work for RunPod so block me due to your own stupidity 👍

mfeldstein67OP•16mo ago

I've fired a lot of guys like you.

ashleyk•16mo ago

Its NOT possible to reproduce with ZERO INFORMATION. Sure you are probably a 13 year old teenager

mfeldstein67OP•16mo ago

No, but I used to teach 13 year old teenagers, and you sure sound like one. Blocking now. Buh-bye.

haris•16mo ago

Hi @mfeldstein67, so sorry for the trouble, I'll see if I can have a member of our team look at this support thread in hopes they can properly assist you!

Gaming

Programming

Problems with larger models

Did you find this page helpful?