Dj
RRunPod
•Created by bghira on 4/11/2025 in #⛅|pods-clusters
ROCm 6.3
This came up recently, maybe we can update the ROCm servers soon. Just a maybe. Can you open an issue on https://github.com/runpod/containers so I can track it?
6 replies
RRunPod
•Created by Yiannis on 4/21/2025 in #⛅|pods-clusters
runpod-torch-v280 & RTX 4090 unsatisfied condition: cuda>=12.8
Oh sorry originally I misread the message. None of the 4090 servers have CUDA 12.8 on the server, you can get 12.8 support on the 3090 and 5090 devices!
7 replies
RRunPod
•Created by abush on 4/21/2025 in #⚡|serverless
Issue with Websocket latency over serverless http proxy since runpod outage
The timing of the outage and this specific bug (assuming you're seeing the issue I think it is) are unrelated. We deliver traffic to/from your pod through the RunPod Proxy which is about 6 or so servers deployed in the US and EU. We know the actual IP of your host, and tunnel that traffic through whichever server would be the fastest.
It's interesting that I only started seeing this issue about the last time we had an outage affecting serverless and more users are affected after another serverless outage. Those events may be related, but since I'm not certain I won't confirm that yet. Do you also see the issue with the proxy when testing locally? If so, can you help me by grabbing an
mtr
to the URL you have in that screenshot as "WebSocket URL"? You don't have to share the output of the mtr
here - you can DM me.4 replies
RRunPod
•Created by Yiannis on 4/21/2025 in #⛅|pods-clusters
runpod-torch-v280 & RTX 4090 unsatisfied condition: cuda>=12.8
Yes, there is a CUDA 12.8 template you can choose when you deploy your pod. If you need a link to it I can get you one.
7 replies
RRunPod
•Created by Jackie on 4/21/2025 in #⚡|serverless
Runpod down?
Confirmed, please hold for a follow up
20 replies
RRunPod
•Created by Jack on 9/24/2024 in #⚡|serverless
AWS ECR Registry Authentication
It's been taken off the roadmap for now but I can probably have that changed, our capacity for stuff like this is a lot better than it was when this ticket was made. I'll follow up with my team Monday 😅
13 replies
RRunPod
•Created by 41see on 4/11/2025 in #⛅|pods-clusters
CUDA device uncorrectable ECC error
Okay, yeah with the context of knowing you were one of the first reports this is absolutely a misfire from support. The machine in question is the problematic machine that keeps getting relisted by its host.
84 replies
RRunPod
•Created by 41see on 4/11/2025 in #⛅|pods-clusters
CUDA device uncorrectable ECC error
Logs are forever (14 days), I'll find it :)
84 replies
RRunPod
•Created by earl_shiro on 4/19/2025 in #⚡|serverless
Ai Malware detection
It's chill, I am not particularly sure. Hashes and the like of course, but I'd email support for a response?
5 replies
RRunPod
•Created by 41see on 4/11/2025 in #⛅|pods-clusters
CUDA device uncorrectable ECC error
@bghira
84 replies
RRunPod
•Created by 41see on 4/11/2025 in #⛅|pods-clusters
CUDA device uncorrectable ECC error
I'm not particularly weekend support, so I'm not home right now but I'll find your ticket based on the response you got and look into this and follow up with you later tonight.
84 replies
RRunPod
•Created by 41see on 4/11/2025 in #⛅|pods-clusters
CUDA device uncorrectable ECC error
It's very likely you were placed onto the physical server this thread was created for and it's something I am personally actively working on with our hardware partner, they're sending this server back online without doing the proper work. We delist it, they say "it's fixed :)" and send it back, and the same server has a problem reported. Naturally this is absolutely not the experience we want you to have.
84 replies
RRunPod
•Created by 41see on 4/11/2025 in #⛅|pods-clusters
CUDA device uncorrectable ECC error
Can you share your pod ID? It is technically a fault of our platform that we distributed you onto this hardware while it is in this state.
84 replies
RRunPod
•Created by Crowmish on 4/18/2025 in #⚡|serverless
Experiencing massive growth of startup/execution time from ~ 22:00 UTC April 17th
Can you share your endpoint ID id love to take a look at this
2 replies
RRunPod
•Created by Anmol Sharma on 4/16/2025 in #⛅|pods-clusters
error creating container
If you can't track it down let me know I'll go find it in your account history.
95 replies
RRunPod
•Created by Anmol Sharma on 4/16/2025 in #⛅|pods-clusters
error creating container
I can grab it too, it's just easier if you already know the id :p
95 replies
RRunPod
•Created by トトロ1号 on 4/8/2025 in #⛅|pods-clusters
Need Pytorch 2.5 and 2.6 offical Docker Image
I've alsooo sent you a DM with some credits for being patient with me while I work on our automation here, I appreciate it a lot.
15 replies
RRunPod
•Created by トトロ1号 on 4/8/2025 in #⛅|pods-clusters
Need Pytorch 2.5 and 2.6 offical Docker Image
@トトロ1号 Sorry for how long this is taking, I want to make our containers flow better and it's delaying me being able to release an offical RunPod 2.6.0 image. One of our employees has this template with the version you'll need built into it -> https://www.runpod.io/console/deploy?template=mm3gw8nlro&type=gpu
15 replies