RTX 6000 Ada performance much worse than expected
From the NVidia specs, I would expect its performance to be on order of 10 - 20% slower than L40S. However, in my current training, I am finding it closer to 2X slower or worse. FP16 mixed precision training. Pretty bad considering price. Perhaps there is some other issue in how the pods or nodes are set up that could be worth looking into?
4 replies
Better solution for 0 GPU stranded volumes
Since on-demand GPUs can get taken, would be great to have some better escape valves for getting our data off the volume. Right now, the 0.5 vcpu 512 MB RAM pod you give keeps killing my upload task. I would happily pay for more resources to speed up getting my data out. Would be nice to be able to attach a network volume to a pod after creation as well, or if you had cross-region network volumes. Network volume that only works in same region is of limited value, because a big reason for moving data around is that there's no GPUs in the region!
21 replies