RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚑|serverless

β›…ο½œpods-clusters

Unable to back up volume data to Google Cloud storage bucket

Hi, I've been trying for a while now to sync my Google Cloud storage bucket to Runpod so that I can back up my volume data. I followed the instructions provided by the documentation, but I just can't seem to initiate the transfer; it just keeps refreshing, and then I open up the options tab to select whether to upload or download from Google Cloud storage. I created my service account key JSON key. I provide the bucket name and directory path, but it doesn't seem to work. I ensured that the buck...
No description

Can't send using RunPodctrl and can't resend

I've installed runpodctl on the receiving PC, but I'm still unable to receive the .zip file. I got an "approve access" notification on the receiving PC, and although I approved it, nothing was received. On the sending Pod, I also can't generate a new one-time send code because it says the .zip file already exists....

Not Work

Why did it stop at 18% when I try to download on port 3000 in the CavitAI+ Detail Tweaker XL tab? This error pops up.
No description

runpod-torch-v280 & RTX 4090 unsatisfied condition: cuda>=12.8

Hello, start container for runpod/pytorch:2.8.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04: begin error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown ...

What will happen if i resume pod on fully occupied physical machine via api

Hello, doc says: "Most of our machines have between 4 and 8 GPUs per physical machine. When you start a Pod, it is locked to a specific physical machine. If you keep it running (On-Demand), then that GPU cannot be taken from you. However, if you stop your Pod, it becomes available for a different user to rent. When you want to start your Pod again, your specific machine may be wholly occupied! In this case, we give you the option to spin up your Pod with zero GPUs so you can retain access to your data." What will happen if I try to resume a stopped pod via API and all gpus in physical machine is already occupied? What the status code will be? Is there any details message in response body? And what the response will be, if there is no free gpu on whole datacenter? ...

upgrading storage makes pod inaccessible

i just tried to increase the storage volume on my pod (76flbo4zz6wawp) and immediately afterwards it says This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime. I absolutely need to get a backup of the data on this pod. Please help....

app keeps disconnecting ?

I run a pod with app.py, an application I run from the terminal to create an online interface for an image generator. Everything works fine, but if I stop using the interface for about 1 or 2 minutes, the API returns an error, the terminal disconnects, and I have to rerun the program. Does anyone know why this is happening and how to solve it? Appreciate your help...

Network volume access

Its there some way of access to my network volume files without creating a pod? i mean something like ftp πŸ€”
No description

runpod nginx networking

hi all! Just started working with runpod and im quite impressed with the affordability but im having some trouble now that im building an application that uses nginx to send users down different ports ...

Pods deleted shown under admin account

I’m reaching out to clarify how pod deletions are logged in the event of an outage. If a pod is deleted due to an infrastructure issue or system-triggered outage, does the audit log or UI display a specific user (e.g., an admin or the original pod owner) as the one who deleted it? We’re trying to determine whether a deletion event attributed to an admin account was manually triggered or automatically initiated by the system. Appreciate any clarification you can provide....

The container doesn't start

Hello everyone! I apologize if this is a basic question, but I'm having trouble getting my pod to start. Could anyone kindly help me figure out how to resolve this issue? We are currently facing an issue where the container fails to start, and the "Connect" button remains inactive indefinitely....

How to use my custom docker image from my dockerhub account?

I have a custom docker image ready on my dockerhub account which as full deepseek vl2 tiny downloaded with python server and I need to run it on the Community Cloud. I plan to just make http/grpc requests to the python server in this docker image so it can make request to deepseek vl2 tiny in the same docker image. I see no option to use my own docker image while creating the pod. Is it possible to do what all I just said or not? Please help!...

Pricing models

Hello, my company is planning on trying some AI models for coding and we are planning on getting the GPUs from RunPod. But what we can't figure out is pricing, how does it work if we use the models about 8:00-16:00 every weekday? Are we still going to be billed $2.47 24/7?...

A100 SXM with 87% GPU Memory Used at boot

I'm trying to boot up a 1 x A100 SXM pod on EU-RO-1, however, it boots up with 87% GPU Memory Used I can't track the process that is using the memory, so I assume it's a bug? ```...
Solution:
It looks like there was an issue with the machine on Runpod and they've removed it!

error creating container

error creating container: nvidia-smi: parsing output of line 0: failed to parse (pcie.link.gen.max) into int: strconv.Atoi: parsing "": invalid syntax
No description

When trying to launch a new CPU pod, the button to choose a template is gone

This only seems to affect CPU pods. I've been using this regularly, I have some CPU pods and GPU Pods, and I haven't had issues in the past, it seems like maybe just a UI error that deleted the button. I was able to launch pods with the API
No description

SGLang DeepSeek-V3-0324

I have been trying to run Deepseek-V3-0324 using instant clusters with 2 x (8 x H100s) and have so far been unsuccessful. I am trying to get the model to run multi-node + multi-gpu. I have downloaded the model from Huggingface onto a persistent and attach the persistent volume to my instant cluster before launching. After launching, I then run the Pytorch demo script as presented in https://docs.runpod.io/instant-clusters/pytorch to make sure that the network is working (it does). I then follow the instructions to get Deepseek-V3-0324 running according to: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3...
No description

scp and ssh over exposed tcp doesn't work

Hello My local machine is a windows but I also tried with WSL. I read through the documentation for setting up ssh and tried 5 times and I'm pretty sure I followed everything perfectly. I find it curious that the normal ssh connection (at window "Connection Options" the above option), does work for the ssh connection. But I couldn't get the ssh over exposed tcp to work, as it kept demanding a password from me, I also tried setting a password in the pod, but that didn't help either. In short, I only want to download the output from a ai video model to my local machine, but scp works with neither of the two options....
Solution:
Just to verify, are you certain you're adding your Public Key to your RunPod account before creating the Pod?

Problem payment by mastercard

Hey Guy. I can't payment by Mastercard or unionpay. Is there anyone who can help? 😭
No description

Can't get stuff in /workspace to persist

1. I have a storage volume 2. I deploy a pod from a volume 3. I check that the volume is mounted to /workspace 4. I add a workflow to this /workspace -- I check this to see it's there 5. I stop pod, I terminate pod...
No description
Next