Why are secure cloud pods so slow?
I'm pretty sure I just wasted a few hours of time trying to find a decent pod that isn't being bottlenecked by it's other hardware. I only managed to find 1 pod a few days ago that was giving me 3 it/s while training a model and it was a community pod.
8 Replies
What kind of model are you training? Are you using Kohya_ss or something else? What kind of GPU are you using and which region of secure cloud are you using?
been trying to train a LORA for SDXL, been trying 3090's, 4090's, tried an a100
havent been picking a region, any suggestions?
Do you know which region you're getting when you're auto assigned one?
says CZ
most of the time
Okay, I'll run some tests, not sure whether its the slow disk causing it to be slow.
someone suggested in another thread it could be old CPU's
but they get away with it because they only show vCPU count and nothing else
Yeah CPU can have some impact but once everything is loaded and the training starts, it should be mostly using GPU not CPU. If I check the CPU usage while training is in progress, its very low, while GPU utilization is basically maxed out.
one thing I noticed about the fast pod I received once was the beginning wheel for Kohya loaded FAST