the bloke and llm not working

I am brand new to runpod. I just opened an account yesterday and put $25 on it. I used an 80g A100 and installed the bloke and then falcon 40b and mixtral. I have wasted $10 already trying to get this to work. Every time I try to load a model it takes 5+ minutes and fails or simply doesn’t even try to load. In the day and a half I been messing with this I only got mixtral to load once time and actually work. Please help me find a solution to this.
6 Replies
justin
justin12mo ago
What are you trying to do? What is objective? Community cloud or secured cloud? If all your looking for is to mess with these models I HIGHLY recommend Ollama
justin
justin12mo ago
GitHub
GitHub - jmorganca/ollama: Get up and running with Llama 2 and othe...
Get up and running with Llama 2 and other large language models locally - GitHub - jmorganca/ollama: Get up and running with Llama 2 and other large language models locally
justin
justin12mo ago
This is a way simpler solution 🤷 if u dont have a particular technical reason to use the others Just use a pytorch template from run-pod go to terminal to execute the commands and ull be good I think its like download and run the install script Run ollama serve then u can do ollama run whatever model you want They have both mixtral and falcon Tho ill warn mixtral is inherently slow / eats up a lot of vram / space. So prob have 50 gb each for container / volume depending how many models you plan to download and where but this should be a good starting point. Along with something that has enough vram so like a 4090 above I think.
scampbell70
scampbell70OP12mo ago
I was using secured cloud. I had 10gig for the container and 1000gig for the volume. Could that have been part of the problem ? I would like to find a model I can use to create fantasy stories for a dnd world I’m building and want to flesh out. Build character profiles and back stories and generate 2000-3000 word stories based on this world. ChatGPT just didn’t do what I wanted so I’m looking for a good uncensored model. I was leaning toward falcon, mistral or Goliath120b. I can’t run it locally my pc just doesn’t have the power so I’m looking for a good online solution. I don’t mind paying for something that works. I got mixtral running today but it took almost 20 minutes to download and another 20 minutes or more to actually get the model to load. Even on the 80gig A100 it was so slow.
justin
justin12mo ago
How did u download it? Again, maybe Ollama will serve u better. Depends, 100 GB would be in your /workspace folder if its volume. 10 GB is outside of your workspace folder. I imagine there are temp folders being created outside ur /workspace So I recommend to use larger container / volume storage You could get bottle necked cause those models are huge also depends how u are downloading it again I recommend Ollama, Ive recommended to others on the server snd they had a much easier experience setting it up
scampbell70
scampbell70OP12mo ago
I will give your method a try as soon as I am able and let you know how it goes
Want results from more Discord servers?
Add your server