Henky!!
deepseek-r is loading for >1h into vram.
6xA100 Q4_K_S GGUF is possible on https://koboldai.org/runpodcpp (if you adjust the container storage to 500GB), won't be sglang but it does have OpenAI API support so it should be easy to integrate with
16 replies
Kobold.cpp - Remote tunnel loads before the model, causing confusion (possible off-product issue)
If you want KoboldAI support I also recommend https://koboldai.org/discord where we hang out, I see that much sooner
16 replies
Kobold.cpp - Remote tunnel loads before the model, causing confusion (possible off-product issue)
But moving the tunnels to async is exactly what it is, we added a feature that allows remote switching between model configs (Not exposed yet on runpod) and for that we need to keep the tunnels in their own process
16 replies
No device found for buffer type CPU for async uploads
I don't yet see anything out of the ordinary it should move past that though. Its possible the context didnt fit and it crashed without runpod showing that part in the file. I recommend testing first at 4K context and upscaling if that works. If it doesnt work I can help better once I am home
6 replies
RRunPod
•Created by Liringlas on 11/3/2024 in #⚡|serverless
Issue with KoboldCPP - official template
Nice that you got it working, if you want to hang out with the other koboldcpp users https://koboldai.org/discord
24 replies
RRunPod
•Created by Liringlas on 11/3/2024 in #⚡|serverless
Issue with KoboldCPP - official template
The odd part is all of them were listed as 100GB ram for me so I'd expect that to fit even without the new optimization
24 replies