GENGHIS Posts - Answer Overflow

GENGHIS

•Created by GENGHIS on 2/8/2025 in #⛅｜pods-clusters

VRAM stuck at 77% usage

VRAM usage stuck at 77% on 1 of my 4 GPUs. already restarted, hard stop, and start. and reset. i don't want to have to switch pods bc I have hundreds of GB of data on the volume that will take a long time to set up again. anything else i can do? tried reset. still stuck. ID: ox02c3pvm058j3

7 replies

RRunPod

•Created by GENGHIS on 2/6/2025 in #⛅｜pods-clusters

no more full ssh? cannot connect vs code / cursor

hi, I used to be able to connect vs code to my pods over ssh by using the 'full ssh' option (supports scp & sftp). that option doesn't seem to be around any more? I have connected to multiple A40 pods and now an H100 over the last couple days and there's no full ssh option. is this going to come back? is there some new configuration needed?

14 replies

RRunPod

•Created by GENGHIS on 6/10/2024 in #⛅｜pods-clusters

Networking on my pod has been shit for last 3 days. please fix. US region. RTX 6000 Ada

Going to try transfering my data to a new pod. Would be great if you could fix the networking. Keep losing connection.

9 replies

RRunPod

•Created by GENGHIS on 5/20/2024 in #⛅｜pods-clusters

RTX 6000 Ada performance much worse than expected

From the NVidia specs, I would expect its performance to be on order of 10 - 20% slower than L40S. However, in my current training, I am finding it closer to 2X slower or worse. FP16 mixed precision training. Pretty bad considering price. Perhaps there is some other issue in how the pods or nodes are set up that could be worth looking into?

4 replies

RRunPod

•Created by GENGHIS on 5/17/2024 in #⛅｜pods-clusters

Better solution for 0 GPU stranded volumes

Since on-demand GPUs can get taken, would be great to have some better escape valves for getting our data off the volume. Right now, the 0.5 vcpu 512 MB RAM pod you give keeps killing my upload task. I would happily pay for more resources to speed up getting my data out. Would be nice to be able to attach a network volume to a pod after creation as well, or if you had cross-region network volumes. Network volume that only works in same region is of limited value, because a big reason for moving data around is that there's no GPUs in the region!

21 replies

RRunPod

•Created by GENGHIS on 2/21/2024 in #⛅｜pods-clusters

`runpodctl stop pod $RUNPOD_POD_ID` failing with 401

I used to end my long running jobs with this command. has failed last several times with 401. runpodctl stop pod $RUNPOD_POD_ID Error: statuscode 401

1 replies

Gaming

Programming