GENGHIS
GENGHIS
RRunPod
Created by GENGHIS on 2/8/2025 in #⛅|pods
VRAM stuck at 77% usage
VRAM usage stuck at 77% on 1 of my 4 GPUs. already restarted, hard stop, and start. and reset. i don't want to have to switch pods bc I have hundreds of GB of data on the volume that will take a long time to set up again. anything else i can do? tried reset. still stuck. ID: ox02c3pvm058j3
7 replies
RRunPod
Created by GENGHIS on 2/6/2025 in #⛅|pods
no more full ssh? cannot connect vs code / cursor
hi, I used to be able to connect vs code to my pods over ssh by using the 'full ssh' option (supports scp & sftp). that option doesn't seem to be around any more? I have connected to multiple A40 pods and now an H100 over the last couple days and there's no full ssh option. is this going to come back? is there some new configuration needed?
14 replies
RRunPod
Created by GENGHIS on 6/10/2024 in #⛅|pods
Networking on my pod has been shit for last 3 days. please fix. US region. RTX 6000 Ada
Going to try transfering my data to a new pod. Would be great if you could fix the networking. Keep losing connection.
9 replies
RRunPod
Created by GENGHIS on 5/20/2024 in #⛅|pods
RTX 6000 Ada performance much worse than expected
From the NVidia specs, I would expect its performance to be on order of 10 - 20% slower than L40S. However, in my current training, I am finding it closer to 2X slower or worse. FP16 mixed precision training. Pretty bad considering price. Perhaps there is some other issue in how the pods or nodes are set up that could be worth looking into?
4 replies
RRunPod
Created by GENGHIS on 5/17/2024 in #⛅|pods
Better solution for 0 GPU stranded volumes
Since on-demand GPUs can get taken, would be great to have some better escape valves for getting our data off the volume. Right now, the 0.5 vcpu 512 MB RAM pod you give keeps killing my upload task. I would happily pay for more resources to speed up getting my data out. Would be nice to be able to attach a network volume to a pod after creation as well, or if you had cross-region network volumes. Network volume that only works in same region is of limited value, because a big reason for moving data around is that there's no GPUs in the region!
21 replies
RRunPod
Created by GENGHIS on 2/21/2024 in #⛅|pods
`runpodctl stop pod $RUNPOD_POD_ID` failing with 401
I used to end my long running jobs with this command. has failed last several times with 401. runpodctl stop pod $RUNPOD_POD_ID Error: statuscode 401
1 replies