zkreutzjanz
RRunPod
•Created by zkreutzjanz on 10/11/2024 in #⛅|pods-clusters
Deploy pod without scheduled downtime

6 replies
RRunPod
•Created by zkreutzjanz on 7/25/2024 in #⛅|pods-clusters
3 pods inaccessible after network outtage
There was a network outage in EU NO and the pods are up, but cannot start:
This is a second time an incident like this has occurred. I have >2 TB of storage I cannot access.
Am I being billed for these pods? No response from support.
5 replies
RRunPod
•Created by zkreutzjanz on 7/12/2024 in #⛅|pods-clusters
Multi Node training with torchrun/slurm
Has anyone here ever tried multinode on runpod? I am thinking of setting this up but if people have encountered prohibitive network speeds I do not see a reason to.
8 replies
RRunPod
•Created by zkreutzjanz on 6/21/2024 in #⚡|serverless
Slow IO speeds on serverless
An A6000 always active worker takes twice as run to run my code than a normal A6000, I think it is IO speed. How can I see IO speeds?
10 replies
RRunPod
•Created by zkreutzjanz on 6/16/2024 in #⛅|pods-clusters
Pod Maintenance update days after

17 replies
RRunPod
•Created by zkreutzjanz on 5/27/2024 in #⚡|serverless
Clone endpoint failing in UI
User input,sensitive information removed:
29 replies
RRunPod
•Created by zkreutzjanz on 5/3/2024 in #⛅|pods-clusters
How to tell how much storage being used in pod? (including network drive)
I try df -h, but it seems to represent the whole filesystem.
13 replies
RRunPod
•Created by zkreutzjanz on 5/1/2024 in #⛅|pods-clusters
How to get a general idea for max volume size on secure cloud?
I have been able to deploy 2TB drives, but what is the standard here? How much storage is there generally per server to estimate what i should expect to be able to get?
37 replies
RRunPod
•Created by zkreutzjanz on 3/17/2024 in #⚡|serverless
A6000 serverless worker is failing for an unknown reason.
In the last week a few of our serverless workers have been failing on all requests. Trying to narrow down a common denominator right now, seems to just be an A6000 issue.
1 replies
RRunPod
•Created by zkreutzjanz on 3/14/2024 in #⛅|pods-clusters
GPU usage when pod initialized. Not able to clear.
Tried nvidia-smi -r, restarting, and reseting. There is still usage on one gpu in the pod.
6 replies
RRunPod
•Created by zkreutzjanz on 2/25/2024 in #⛅|pods-clusters
Pod running but inaccessible

1 replies