Issue with KoboldCPP - official template
I tried with two models (103b Midnight Miqu v1.0 and 123b Behemoth v1.1) in Q4 GGUF on a pod with the https://www.runpod.io/console/explore/2peen7lpau template. In both cases the models download successfully (2 files in both cases)
When launching Kobold CPP the following error:
Something possibly went wrong, stalling for 3 minutes before exiting so you can check for errors.
The full logs are included.
- The pod had 2x A40 48GB gpu with the default 125GB temporary container disk, and the default environment variables except for the model address.
The KCPP args (default) should allow 2 GPUs if I understand correctly: --usecublas mmq --gpulayers 999 --contextsize 4096 --multiuser 20 --flashattention --ignoremissing
Thanks a lot!
11 Replies
https://discord.com/channels/912829806415085598/1118945694863065230/1300828220907851806 There was a similar post but with no reply except for the guide which I followed in the first place π
@Henky!! relevant to your template?
Thank you, I tried again following someone's instruction that I found on another discord. The KCCP_MODEL env variable is written a bit differently: https://huggingface.co/bartowski/Behemoth-123B-v1.1-GGUF/resolve/main/Behemoth-123B-v1.1-Q4_K_M/Behemoth-123B-v1.1-Q4_K_M-00001-of-00002.gguf?download=true,https://huggingface.co/bartowski/Behemoth-123B-v1.1-GGUF/resolve/main/Behemoth-123B-v1.1-Q4_K_M/Behemoth-123B-v1.1-Q4_K_M-00002-of-00002.gguf?download=true, with no space after comma and "?download=true" at the end of both links, which I did not use the first time. This time it worked π Not sure what was the issue the first time. Is it the formatting of the variable?
Ah since we assisted in our discord but I can help
The issue is that people try to fit models that don't fit
Or use context that doesn't fit
That model Q4_K_S I have succesfully tested on an A100
But people who try it on 2x48GB have been reporting it dooesn't fit especially if they use Q4_K_M
Although that specific upload may also be broken
This one launches for me succesfully on 1xA100 :
If you do go for split GPU deleting the image gen model after the fact can help since that adds a couple of gigabytes to the first GPU, runpod does not allow deleting it before making the pod due to a runpod bug
Updated the error message to give that hint in the future
@nerdylive Is there a way for me to know in hindsight how much ram that instance had? I wonder if its being task killed
I can't reproduce it anymore so I suspect it was regular ram related, my latest change should make system ram irrelevant
I don't know maybe try using that specific same gpu 2x a40 in whichever cloud or DC they are using
The odd part is all of them were listed as 100GB ram for me so I'd expect that to fit even without the new optimization
Or maybe ask them to confirm what did they rent
Thanks a lot for your help, it did work in the last try where i used the same way of writing π
I think this might be where I did a mistake the first time π
I might work in IT myself but in the end even for us, the issue is most of the time located between the chair and the keyboard π
The air~
Nice that you got it working, if you want to hang out with the other koboldcpp users https://koboldai.org/discord
Discord
Join the KoboldAI Discord Server!
This community is dedicated to the usage and development of KoboldAI's software, as well as broader text generation AI. | 12381 members
Thanks a lot for your kind help, both of you!