Problems with larger models
I'm having trouble consistently running models larger than 70b parameters in webui. They only work maybe one in ten times. When I do get them to work, even if I keep the pod, put it to sleep, and spin it up again later, I get error messages. Here's an example of error messages I'm getting from trying to load a model that I have successfully loaded before using the exact same configuration:
Traceback (most recent call last):
File "/workspace/text-generation-webui/modules/ui_model_menu.py", line 214, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
File "/workspace/text-generation-webui/modules/models.py", line 90, in load_model
output = load_func_maploader
File "/workspace/text-generation-webui/modules/models.py", line 399, in ExLlama_HF_loader
return ExllamaHF.from_pretrained(model_name)
File "/workspace/text-generation-webui/modules/exllama_hf.py", line 174, in from_pretrained
return ExllamaHF(config)
File "/workspace/text-generation-webui/modules/exllama_hf.py", line 31, in init
self.ex_model = ExLlama(self.ex_config)
File "/usr/local/lib/python3.10/dist-packages/exllama/model.py", line 852, in init
self.embed_tokens.weight = nn.Parameter(tensors["model.embed_tokens.weight"])
KeyError: 'model.embed_tokens.weight'
49 Replies
This sounds like an issue with oobabooga not with RunPod, I suggest logging an issue on the oobabooga Github repo.
Unfortunately, the ooba channel is populated mostly with hobbyists who are running models locally and uninterested in (or unable to) help non-technical folks who are focused on learning about the inferencing output rather than tinkering with command lines.
Even so, if this is an ooba problem rather than a RunPod problem, at least I know more than I did. But nobody on that board seems to recognize the problem. Which makes me think that either they just tinker in ways that I can't and don't think about it as a problem or there's some sort of ooba/RunPod interaction issue
There isn't an ooba/RunPod issue, there are hundreds of people who use ooba to interface with LLMs on RunPod successfully, so you probably either used the wrong model loader, used a GPU that doesn't have sufficient VRAM to load such a large model, or maybe even ran out of disk space, but you have given basically zero context so its impossible to determine what the issue is.
"I'm having trouble consistently running models larger than 70b parameters in webui." is completely and utterly useless to anyone. If you can provide the actual model that you are using, it would help but this information is like trying to find a needle in a haystack it is so completely useless. You also don't mention the GPU type you are running or anything useful at all. People can't read your mind. Be specific, then maybe you will have better luck in someone actually being able to assist you.
"I'm having trouble consistently running models larger than 70b parameters in webui." is completely and utterly useless to anyone. If you can provide the actual model that you are using, it would help but this information is like trying to find a needle in a haystack it is so completely useless. You also don't mention the GPU type you are running or anything useful at all. People can't read your mind. Be specific, then maybe you will have better luck in someone actually being able to assist you.
Um. I just provided an error log. Do you actually work for RunPod? I need to decide whether to complain to the company or just block you.
This is a reproducable problem on identical configurations.
I don't work for RunPod so block me due to your own stupidity đź‘Ť
I've fired a lot of guys like you.
Its NOT possible to reproduce with ZERO INFORMATION.
Sure you are probably a 13 year old teenager
No, but I used to teach 13 year old teenagers, and you sure sound like one. Blocking now. Buh-bye.
Hi @mfeldstein67, so sorry for the trouble, I'll see if I can have a member of our team look at this support thread in hopes they can properly assist you!
Without more informations cant help much either
@Polar Thanks. I'm non-engineer trying to learn about inferencing capabilities of mid-sized models, so I don't know what information you need. If you or @Madiator2011 or anybody else needs more information, please tell me what you need. I usually don't do this sort of thing without the help of an engineer but I'm on my own at the moment.
what kinda of model you try to load?
@Madiator2011 There aren't many models in the 100b - 120b size that look legit. The two I've spent the most time trying to get working are Goliath and Rogue-Rose 103b. In the latter case, I've tried both GPTQ and AWQ formats. I was able to get Rogue-Rose to load once. But when I put the machine to sleep and woke it back up again—no other changes—it threw errors again. Which sounds like a DevOps problem to me. It's hard for user error to interfere with pressing the "stop" and "play" buttons.
@Madiator2011 I always use The Bloke's quantizations because he has such a good reputation.
Like I said without full error message cant help
Actually, that's not what you said; you didn't specify what information you need. But nevermind. If you read from the top of the thread, you'll see I posted the error message from trying to load Rogue-Rose 103b GPTQ.
in logs you posted there is no error
@That's what the logs gave me. They did not give me the usual message saying the model successfully loaded. And when I tried to run the chat, it hung. It simply didn't reply. So I got an abnormal log message followed by a model that didn't work. This is all in webui/ooba.
@Madiator2011 ^^^^
Provide this info:
1. What template do you use:
2. What if full model name:
3. What GPU do you use:
4: Check if your pod is not running out of storage:
Model is TheBloke/Rogue-Rose--70b-v0.2.qptq. GPU was an H100 with standard configuration. My pod is not out of storage. (I use a 2TB storage container and always check the pod dashboard, since I don't know enough to be sure if I have enough RAM, GPU, disk space, etc.)
BTW, as I write this, I was able to get Goliath to run from my storage for the first time. Didn't do anything differently and I'm not putting money on it running if I put the pod to sleep or try to run it in a new pod.
@Madiator2011 ^^^^
Everybody wants to go right to user error, ignoring the whole thing about being able to get a model running successfully and not having the model restart after pausing and unpausing the pod. It sure as hell smells like a DevOps problem to me.
from what I see your error is about ExLama loader
That is the loader that Ooba defaults to and it's the only one that has successfully loaded Rogue Rose. I've tried the others except for the really old ones. But OK, I'll play along. What would you suggest as an alternative?
BTW, Goliath just stopped responding mid-chat. Ooba throws an error in the chat window with the detail field reading "None." Now, it could be that the model's unstable. But I've not read anything about that. And given all the problems I've been having, my money is still on DevOps.
@Madiator2011
Ask on their discord server
@Madiator2011 Do you work for RunPod?
Yes I do
@Madiator2011 Can you please bring in your supervisor? You have yet to even respond to the pause/unpause problem. You offer these templates, including Ooba, as a service. It's not just a thing you do. It's a differentiator. I can rent GPUs from a dozen different places. More every day. But you treat my problem like it has nothing to do with your company. Either you support your offering or you don't.
@Madiator2011 I've been on the Ooba Discord server. It's open-source. Nobody over there cares about replying to a noob. So I'm asking the company that I'm actually paying to run it for me.
Though keep in mind we are gpu renting company and we can’t provide support for all the third party tools. User is responsible for using .
YOU PROVIDE THE THIRD-PARTY TOOLS. IT'S PART OF YOUR SERVICE. YOU ADVERTISE IT. Can I please speak to a supervisor?
We are providing GPU for users to run their software. That does not mean we are tech support for all tools existing in the world.
No. Just the ones you preconfigure for your users as part of the service you advertise and I pay for. Are you going to put me in touch with a supervisor, or am I going to have to chase this through another channel?
Have you tried submitting your issue on their GitHub? https://github.com/oobabooga/text-generation-webui
GitHub
GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large...
A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models. - GitHub - oobabooga/text-generation-webui: A Gradio web UI for Large Language Mod...
I have a model that runs. I press pause on YOUR RunPod dashboard. I press play. The model no longer runs. You are a DevOps company. That's what you do. You run the lower part of the stack. Yet you will not consider for one moment that you have a DevOps problem with your software. You have ignored that problem repeatedly when I bring it up. I'll ask one last time: Can I speak to a supervisor?
At RunPod, while our primary focus is on providing hardware support and infrastructure-related assistance, we do try to help with third-party tools where we can, as an act of goodwill. However, our expertise in these areas is limited mostly cause many of this tools get updated very often and it can cause them to break.
Additionally, to effectively assist you, even within our capabilities, we need specific information about the issues you're encountering (with you refused to provide). Without these details, our ability to provide even basic guidance or suggestions is significantly restricted.
For the in-depth support with oobabooga, I recommend reaching out to their dedicated community or support channels. They have the specific expertise required for their software and are better equipped to address such specialized queries.
OK. I'll try to get to a manager through other channels. You sound like a bad customer service chatbot now.
How you except to get any support without providing any requested information. I'm sorry that I'm not mind reader.
@mfeldstein67 So I have tested that model and it works fine on single A100 though you were using wrong model loader
Here are my settings
Used Ashleyk template: Text Generation Web UI and APIs
cheers guys
@Madiator2011 Well, that's awesome for you. I tried that loader before. It was the next logical choice. It failed. If two pods with identical configurations show different behaviors on your system that suggests...? If software in a pod is working, the pod is paused, the restarted, and the software no longer works, that suggests...?
Now I will give you that the OSS software has been improving. So behaviors may change. I've had better luck loading models more recently. But the start/stop behavior on your pod, which I've brought up multiple times now and which you've failed to respond to even once? That's not third-party software. That's RunPod software. You are trying so hard to make this my fault. Not once have I heard words like, "I'm sorry you're experiencing trouble, Mr. Customer. Let's me see if I can help you solve your problem."
oobabooga web ui is not created by RunPod. Also if you stop the pod all data in container storage gets wiped just for the info.
Also you've failed to respond to even once. When I requested you to provide basic informations 🙂
Actually, I'm pulling the pod from your storage unit. So the data should not and does not get wiped. It's there. It just doesn't run. Which you didn't ask me about. Could you kindly point me to one single specific piece of information you've asked me about that I didn't give you?
Also, you're not supposed to get pissy back at a pissed off customer. Didn't anybody teach you anything at all about customer service? Honestly, I don't know why I'm bothering with this. I'll be trying to reach somebody in charge tomorrow.
Unleess @nerdylive Is a senior RunPod person? Can you help me, by any chance, @nerdylive ?
no im just a regular customer
Well, if you're looking for help, you apparently won't find it here from RunPod.
@Madiator2011 Repeat after me: "I'm sorry for your difficulty, valued customer. Using the information you provided me, I was able to get the model you are trying to run working with a different loader. Could you please try it and let me know if it works for you as well? If that works, I'll have some questions for you about pausing the pod. While we don't make webui or the model ourselves, I'll do my best to make sure the problem isn't on our end and provide you with what help I can with the templates we provide as part of our service."
Why, thank you, @Madiator2011. Even though Ex_Lama2HF did not load for me when I tried it several times a few weeks back. It is working for me at the moment. I will test it and let you know if it loads consistently, which would indicate an improvement somewhere in the software, or if it only works inconsistently, which would suggest some sort of a DevOps problem. Or I would do that if I were talking to a competent customer service agent who actually cared. Instead, I'll do my best to reach your manager and tell him or her as part of our conversation.
@mfeldstein67, I've tried to assist you, but without the necessary information you've repeatedly failed to provide, my ability to help is limited. Due to your recent disrespectful behavior towards me, I've decided to no longer provide you with support. This matter is now being escalated to our moderation team for further review. They will address the situation accordingly.
@Madiator2011 Thank you. I've been asking you to escalate for some time now. As part of that review, perhaps the moderation team can identify the specific information you claim you asked for that I haven't given you, since you seem unwilling or unable to tell me. But mostly I would just like somebody to actually engage with me as a customer and help me troubleshoot my problems with loading large models into RunPod-provided and preloaded templates as well as inconsistent behaviors when pausing and restarting pods using the RunPod-created and RunPod managed pod dashboard management dashboard.
1. https://discord.com/channels/912829806415085598/1189918642809352234/1189921418108993628
2. https://discord.com/channels/912829806415085598/1189918642809352234/1189922279824568472
3. https://discord.com/channels/912829806415085598/1189918642809352234/1190619969491308555
4. https://discord.com/channels/912829806415085598/1189918642809352234/1190624415214485574
You asked me about the hardware specs. I gave them to you. You asked me about the template. I told you I was using Ooba. You asked for the exact name of the models. I gave them to you. You asked for the log information. I gave it to you. When none of that enabled you to successfully diagnose the problem, you blamed me for not providing you with enough information instead of just saying, "Huh. I'm sorry, but I'm not sure why this isn't working." Even that would have been acceptable.
Those links in your last post are more obfuscation. I even volunteered relevant information that you didn't ask for, should have asked for, and didn't respond to, like the fact that I'm using your storage unit. For the sake of the auditors you are asking to review this thread, please provide the information you asked for that I refused to reply. Here. Now. Directly. In plain text. In your reply.
If I hadn't given you sufficient information, you wouldn't have been able to come back to me with a working instance of the exact model I was struggling with, using the exact hardware configuration and exact RunPod configuration.
1. You did not provide information on what type of GPU you using.
2. There are tons of templates with Ooba web UI.
3. You did not even told if you are using either volume storage or network storage.
@ashleyk is one person that know much more than me when it comes to oobabooga (he made one of the templates)
And you disrespected him by calling him 13 year old teenager
Then you started disrespecting me by treating me like your personal slave even though I was trying to help you.
If you do not respect other people do not expect other to respect you.
So if you going to be mean not expect anyone willing to help you.
First, as a professional customer service representative, you should know when to stay calm and when to wait for the supervisor that you have called in to respond on your behalf. Second, my exchange with the user you mention was not on your official customer service board, had nothing to do with you, and followed a string of disrespectful comments from him that I have no need to rehash here on RunPod's official customer service channel. The user you mention, unlike you, is not a RunPod employee charged with answering RunPod customer service questions and was not interacting on a RunPod run and sactioned customer service discussion board. If you're really telling me, as a paid RunPod employee charged with providing customer support to paying customers, that you refuse to provide me with customer support because I was "mean" to your friend on another board that has nothing to do with RunPod, then I very much hope that the moderators and your supervisors do read this thread.
For your info Discord server is for community type of support
So RunPod doesn't care, as a company, how a RunPod employee chooses to interact with customers on the official RunPod Discord server? THAT I would very much like to hear from a supervisor.
I was not treating you like a slave. I was treating you like a representative of your employer providing customer service I pay for in a company-branded support channel. If you do not think you are supposed to respond as such in this channel, then I'd like to hear from your employer whether that is consistent with company policy.
I have saved a copy of this entire thread for reference in my conversations with your management. I look forward to learning more about RunPod's official support policies as well as its HR policies about how employees are expected to treat customers in official RunPod-branded and RunPod-promoted support channels.
Tip for the moderators reading this: If this is NOT intended to be an official company support channel monitored by official customer support representatives, I suggest you post that prominently, along with a referral pointing customers to official support channels. Particularly if you are not setting and enforcing policies about how your employees may or may not interact with paying customers in this channel.
Hey there - I am the Customer Success Lead for RunPod. I am going to DM you so we can talk about what happened here 🙂
Sent a friend request since DMs are off - let me know