GENGHIS
VRAM stuck at 77% usage
VRAM usage stuck at 77% on 1 of my 4 GPUs. already restarted, hard stop, and start. and reset. i don't want to have to switch pods bc I have hundreds of GB of data on the volume that will take a long time to set up again. anything else i can do?
tried reset. still stuck.
ID: ox02c3pvm058j3
7 replies
no more full ssh? cannot connect vs code / cursor
hi, I used to be able to connect vs code to my pods over ssh by using the 'full ssh' option (supports scp & sftp). that option doesn't seem to be around any more? I have connected to multiple A40 pods and now an H100 over the last couple days and there's no full ssh option. is this going to come back? is there some new configuration needed?
14 replies
RTX 6000 Ada performance much worse than expected
From the NVidia specs, I would expect its performance to be on order of 10 - 20% slower than L40S. However, in my current training, I am finding it closer to 2X slower or worse. FP16 mixed precision training. Pretty bad considering price. Perhaps there is some other issue in how the pods or nodes are set up that could be worth looking into?
4 replies
Better solution for 0 GPU stranded volumes
Since on-demand GPUs can get taken, would be great to have some better escape valves for getting our data off the volume. Right now, the 0.5 vcpu 512 MB RAM pod you give keeps killing my upload task. I would happily pay for more resources to speed up getting my data out. Would be nice to be able to attach a network volume to a pod after creation as well, or if you had cross-region network volumes. Network volume that only works in same region is of limited value, because a big reason for moving data around is that there's no GPUs in the region!
21 replies