RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

PyTorch 2.3: Lacking image on RunPod

Hi RunPod, please add a new PyTorch image for PyTorch 2.3.1.

Is there a way to see GPU utilization history?

Do I need to setup additional monitor app myself to keep track of the metric history?
Solution:
There is no apis to collect metrics from runpod Yet, but there is a way to monitor gpu utilization by calling nvidia-smi command in your cli

Docker image pull error: too many requests

Attempting to start pods with a template based on debian results in the following error currently. ``` 2024-06-21T03:20:33Z create pod network 2024-06-21T03:20:33Z create container debian:bookworm-20240423...

Stop Pod from Jupiter Lab

Hi guy, I'm using Runpod to finetune models in Jupiter Notebook. Are there a way we can stop a Runpod from within a notebook like an API call or SDK? It woukd be greatly appreciated. Thanks in advanced!...
Solution:
Already answered you in #🎤|general , no need to post the same thing in multiple places.

What is the best way to upload a 7GB model to my network drive.

Please advise any solution that wont break half way through. Would like to upload to my SD model folder on my workspace. Thanks

Snapshot from Pod

We need to create a snapshot from a pod instance, to execute it in another cloud location inside Runpod supported cloud.

GPU pod's performance is inconsistent

I am using a pod (RTX 4090 with 100GB network-volume) to generate image. As expected, a task need around 5-6s to finish. sometime performance drop to 30s/task. Can anyone explain what's going on to me? Thank you so much...
No description

How to install NVIDIA driver on Ubuntu Server image?

Dumb question, i am aware, but i cannot install the NVIDIA driver on the Ubuntu image I'm getting this error ``` dpkg: error processing archive /var/cache/apt/archives/nvidia-compute-utils-550_550.90.07-0ubuntu0.24.04.1_amd64.deb (--unpack):...

Cloud Sync False "Something went wrong" and secrets fail

When using Cloud Sync with Backblaze, I'm having 2 problems. First: if using secrets, it gives no feedback when I click "Copy from Backblaze B2". I have tried this repeatedly on different pods and with re-created secrets. I'm calling the secrets like: {{ RUNPOD_SECRET_BB_app_id }} I would expect at least an error that the request was rejected or something so I can fix the problem....

NVLink for multi-gpu counts?

I'd like to make some tests on e.g. 2x3090 interconnected with NVLink. Is there an option or any available On-Demand GPUs (apart from the DGX or H100/A100 SXM systems) to do this on Runpod?
Solution:
I don't think so, try opening a support request

Can we use systemctl with pod?

I need systemctl to connect to our experiment infrastructure. When i run the command, i get the following error: (vh) root@2a8382770b55:~/mono# sudo systemctl daemon-reload System has not been booted with systemd...
Solution:
use service instead of systemctl.

Issues with SD comfyui template

Issues with the notebook including; 1) Cant Install / Update comfyui ...

slow secure cloud pod

I created a secure cloud pod in the Iceland region. The up- and download speed is indicated in the screenshot. If I load data on the pod from AWS S3, the real speed is 6 Mmbps. I tried on another pod in the community cloud and get ~100Mbps. Why is the first pod so slow? Which European region has the fastest pods, in terms of download speed?...
No description

How do saving plans work?

I don't quite understand how to enable it. Isn't it better to select a saving plan for a given period and you show me how much I need to pay and I would pay it? It seems you expect me to know how much I need to pay upfront, credit the account with that amount and THEN activate the saving plan.
Right now I would like to pay for 3 months saving plan. How do I proceed? Thanks...

Terminate POD with SSH

Hello! I use the following command to stop and then terminate the pod using ssh. It stops, but it is just marked as "Exited" in the interface; so it seems that the second command does not work. ` nohup bash -c "sleep 1h; runpodctl stop pod $RUNPOD_POD_ID && runpodctl remove pod $RUNPOD_POD_ID" & I would like to fully terminate it from ssh, so that it does not incur charges while I do not need the POD anymore as the task is already finished. So, how can adjust the command to fully terminate the POD?...

Pods issues

Hello, I can't access my pods for around 13 hours... Some of them have this warning: We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime., but there are a lot more without any warning which I cannot access. Please help

Maintenance scheduled: 5 days downtime and data loss. What does this mean?

My pod is showing this message Maintenance Scheduled This machine is scheduled for kernel and driver update. Please transfer your data ahead of time, since there will be a dataloss. Start: 06/24/2024 15:01 Local Time...
No description

Ram issue

Hello guys, I am running the setup on the attached picture. The image I am trying to pull is cognitivecomputations/dolphin-2.9.2-qwen2-7b from huggingface. Even though I have a lot of RAM, I am getting this error: torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.25 GiB. GPU ...
No description

free credits

Hi can i get 1 hour free credit with 24 gb GPU for test if my script work? If yes i will buy credit.