RunPod

R

RunPod

Join the community to ask questions about RunPod and get answers from other members.

Join

⚡|serverless

⛅|pods

Training AI with a RunPod GPU

Hi, I'm pretty new to all this AI stuff and cloud GPU and i'm currently trying to create an AI. I'm trying to train a yolov8xl model with a dataset of about 100k images and 31 class and because it's a big project, my GPU cannot handle such a massive project or it will be really slow. So, I wanted to use an Nvidia A6000 to train my model but I really don't understand how does it work, I even asked chatGPT that told me that i needed to import my dataset into runpod but i don't see anything to impo...

RTX 4090 Instances Not Starting Up

Hello, RTX 4090 instances in Secure Cloud do not seem to be starting up properly. Attached is a screenshot of what I see when I try to start up 7 RTX 4090s. Thanks...
No description

hardware graphics acceleration

Hello, I am trying to create an instance on RunPod that provides hardware graphics acceleration (using an NVIDIA GPU) along with a fully functional remote desktop environment. To achieve this, I have tried using the following images: ...

GraphQL secretCreate Mutation

I have a runpod teams account, and want to programatically add secrets to the account (https://docs.runpod.io/sdks/graphql/manage-pod-templates#create-a-secret). I'll then be using these secrets in my Pod. This API works when I use my personal key, however, when I use the team API key, it gives me the following error: Not authorized. Missing required scope(s): TEAM_ADMIN, TEAM_DEV My API key already has all full access, and running the "myself" query shows that I do have the "admin" role in "teamScopes". What exactly am I missing?...

EUR-IS-1 severely low latency

It seems like EUR-IS-1 has a transfer speed issue ongoing (since yesterday?). It can take 2-3 minutes for me to download a 5Mb png image. This is involving both GPU and CPU pods. I have a CPU pod currently backing up my network drive to Dropbox. As you can see from the image the transfer rate is abysmal—it's even worse when I try to download something from my pod straight to my PC....
No description

Global Networking

I am trying to use Global Networking. i have 1 master and 2 worker GPUs, all on different pods, but in the same data centre. it seems that the ports are not open between the pods and only port 22 is. I tried to specify a specific TCP port to expose when starting up the Pods too, but it does not work. I need to allow communications between the Pods for torch.dist

Need Help with Auto GPU Shutdown & Startup

Hey everyone! I'm new to RunPod and exploring it for my AI workloads. I’d like to optimize GPU usage by setting up a system where the GPU server automatically shuts down when idle and starts up again only when a job or request needs it. Is there a clean way to achieve this using RunPod's features—like APIs, webhooks, or serverless functions? ...

Model upload to huggingface is so slow it costs more than training

Model upload is always slow which makes runpod more expensive that it needs to be. But sometimes, it's super slow. Is there a specific datacenter which has a better internet connection, or any other way for me to avoid starting pods that have a very slow upload speed?
No description

ports section not popping up?

hcediyq1lb9amz pod id above...
No description

template

I'm trying to start a pod in Secure Cloud. When I select an appropriate server, I click Runpod Pytorch 2.8.0 from Change Template and it immediately resets it to 2.1 (which has CUDA 11.8, which won't work for my application.) Why is it doing this and how do I fix it?

I can't use any of the pods at all. Every one I deploy ends up having a Not Ready Status for HTTP

I have tried so many pods and deployed a bunch of them, but every single one of them had this issue. Not even the terminal works properly when I click start. It just goes back to the stopped grey status. Some do Have the ready status but it just leads to a 404 error or cloudfare error.
No description

SwarmUI with Network Storage. Missing modules on new pod connecting to it?

I'm using network storage to store my SwarmUI installation and my models. When I spin up a new pod to connect to it the ComfyUI backend is crashing because it's missing modules (transformers). I'm using nerdylive/stableswarm:v0.0.7 Am I missing something? Is the expectation to have to install SwarmUI everytime I start a pod?...

Just found a security issue on runpod

Hi guys, i've just found a security issue on runpod, where can i report it? does runpod have a bug bounty program?

Is it possible to get pod logs from REST Api or GraphQL?

Is it possible to get pod logs from REST Api or GraphQL?

Why am I unable to connect to a http server?

For some reason this randomly occurring now when I try to connect. It was working fine before in a previous pod. Cant even click start because the start button wouldn't work either. It would try to connect just to go back to the green start button. Happening on every pod I go to.

How do I install a model with kobold ai?

I am having trouble. It is stuck at this for awhile when I been using my pod. It downloaded 64 gb model without issues, but once it started loading model tensors it stopped at 330/664. What can I do to fix this? I am lost. The loading bar is still occuring....
No description

L40 Thermal throttling

We noticed we are having an occasional big slow down when running our models. from a 10-15 second calculation to 90-120 seconds.
Test run on pod: 8hh03rby46hd8s - when power draw goes to ~300W and SM usage to ~100%, GPU clock drops from 2490Mhz to 1650Mhz
- as soon as as power draw drops to base of ~80-90W, GPU clock goes back to full speed
We're getting 65% of the performance of desired GPU ...

New to runpod. Have never even coded in my life

Hey Guys, I just signed up for Runpod and when I hit the Explore button it only shows me "official templates". No community templates pop-up. I want to create portraits in Flux

question about price of gpu pods

hey, i had a question about the pricing of the pods, i tried an other hosting service but they charged me even if the server was down but just created, and i was wondering if RunPod will do the same ?

Runpod occasionally fails to pull from ECR

Every now and again I have issues starting a pod as it fails to pull from AWS ECR. Nothing in my setup changes. ```error pulling image: Error response from daemon: Head "https://<AWS_ACCOUNT>.dkr.ecr.<region>.amazonaws.com/v2/<repo>/manifests/<container>": no basic auth credentials error creating container: container: create: container create: Error response from daemon: No such image: <AWS_ACCOUNT>.dkr.ecr.<region>.amazonaws.com/<repo>:<container> create container <aws_account>.dkr.ecr.<region>.amazonaws.com/<repo>:<container>...
Next