RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Unable to create template or pod with python sdk version 1.6.2

```python import runpod import os ...
Solution:
You called your script runpod.py, so it conflicts with the runpod module. You can't do that, give your script a different name.

Pod unable to read environment variables set in templates caused a loss

Hi, this issue has caused us to create over 70 pods that are running idle, the pods did nothing.

n00b multi gpu question

Hello hello! I created a 4 gpu pod (screenshot), then asked pytorch what devices it saw, and it just saw one - what's the dumb thing i'm missing? Thanks 🙂...
Solution:
Alright so, I restarted the pod (with the env var you suggested) and CUDA reported zero gpus Then I removed the env var, restarted, and CUDA now reports four GPUS. no change from previous code/config Either:...
No description

runpodctl not found on pod

I wanted to run some tests. This involves a pod stopping itself after executing a task. To do this, I execute some work and then call runpodctl stop pod $RUNPOD_POD_ID inside the container from a bash script. This works in my actual production container, but it doesn't work in my test environment. The pod says that runpodctl can't be found (2024-06-11T13:56:58.504874269Z ./run.sh: line 11: runpodctl: not found). Even after letting it run for a while, it can't ever find runpodctl. Any idea what I can do about this? Here's a very minimal Dockerfile: ``` FROM alpine ...
Solution:
runpodctl won't be installed on the alpine image by default

Cheapest GPU for volume.

I use 300 GB volume for my models. However - the cheapset GPU is RTX 4090. Is it dependent on the region of my choosing or are these the only GPUs which have a volume enabled?
Solution:
There are different levels of availability for different GPU types within different regions. You can view the availability preview on the page for creating a network volume before actually creating it.

Custom Container Start Command Not Working

Hello, I want to create a custom template which clones a repo and then runs a script from that repo. However, the pods I've launched with it have failed to clone the repo, much less execute the script. Here is my container start command...

remote-ssh broken

Remote-SSH on VSCode is broken on community cloud instances, due to there not being PTY support on the ssh client (because it's not a public IP?). This was asked before, but not sure if it was fixed
Solution:
You can try this:
pip install OhMyRunPod
OhMyRunPod --setup_ssh
pip install OhMyRunPod
OhMyRunPod --setup_ssh
...

Networking on my pod has been shit for last 3 days. please fix. US region. RTX 6000 Ada

Going to try transfering my data to a new pod. Would be great if you could fix the networking. Keep losing connection.

Backend error

i get this error when i try to start my sd with my custom image, how to solve this error?
Solution:
It simply says that your option isn't available, pick from one of those
No description

Custom Template with Jupyter not working

im trying to create a custom template that use jupyter, so im referencing the way jupyter installed and start from the official stable diffusion runpod image https://github.com/runpod/containers/blob/main/official-templates/stable-diffusion-webui/Dockerfile https://github.com/runpod/containers/blob/main/container-template/start.sh and the result is the jupyter did started, but when i click the Connect button and "Connect to HTTP port 8888" button, it leads me to a login page of the jupyter notebook i noticed that the "connect to jupyter" button for the official sd runpod will lead to jupyter notebook link with token as its params (ie xxxx-8888.proxy.runpod.net/lab?token=xxxxx), but with the runpod that runs my custom image, the button only lead to xxxx-8888.proxy.runpod.net...
Solution:
set --ServerApp.token="" if you don't want a password
No description

Pod system error

I've been running this pod for over 6 months and suddenly it's having issues. Although it says the pod is "running", the system logs show this error repeatedly: 2024-06-10T12:28:12Z start container 2024-06-10T12:28:14Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: device error: GPU-xxxxxxx: unknown device: unknown...

Fast loading of large docker image

Hello, I am trying to use a large docker image (>20GB) to start a pod. Is there a way to cache it in a network volume, and then start a pod from it, to start a pod quickly (I already sent my image in .tar on a volume, but couldn't find how to start a pod from it)? Or is there a better solution? Thank you!...

0 gpu available

currently i can't assign a gpu to my pod. will i get the option to at a later point when some free up or is my pod doomed?

DATA LOSS IN EU-RO-1 - URGENT

Need someone to communicate ASAP this is literally sev0. We have all of our data and work backed in that network drive and out of the blue the files just disappeared.

Clarify RAM available

Hey there! Was thinking of using 4090 pods, and I saw that the deployment only had 60GB of RAM - which seems really low for a machine with 8 4090's. However, in the filter section, it actually states that this is per GPU. It would be great to clarify that in the deployment section as well 🙂 - I'm sure others have been confused as I was....
Solution:
this should have been already fixed. Are you still having issues? It seems to work fine on my end (https://karalite.kaj.rocks/chrome_vCVXLLL3aN.mp4)
No description

Create CPU Pod through GraphQL

The API seems to expect a gpuTypeId even when you specify gpuCount: 0. Is there currently any way of creating a cpu only pod with GraphQL or any other programmatic way? Thanks
Solution:
Hmm yeah try to reverse engineer it from your browser for now

How to add files?

I just setup a pod for the first time and I am trying to ssh into it to add some custom model files. I am getting this error and the connection closes. I dont really know much about ssh and this is my first pod. ```-- RUNPOD.IO -- Enjoy your Pod #oxb2cpeousjz ^_^ ...

Very Slow Mapping

Hello! I am trying to run dataset.map() and it takes only a few minutes when I run it on Colab. However, when I run it on any machine on RunPod, it reports that it has several hours to finish. I reported this to the Support, but no solution yet. I wonder if anyone faced a similar issue, and how to solve it. The code below is for pre-processing an audio dataset for Whisper fine-tuning. Thanks! ``` def prepare_dataset(batch): audio = batch["audio"]...

How to get a public URL?

I don't want localhost because I can't access it locally.
No description