Cheapest GPU for volume.
I use 300 GB volume for my models. However - the cheapset GPU is RTX 4090. Is it dependent on the region of my choosing or are these the only GPUs which have a volume enabled?
Solution:
There are different levels of availability for different GPU types within different regions. You can view the availability preview on the page for creating a network volume before actually creating it.
Custom Container Start Command Not Working
Hello,
I want to create a custom template which clones a repo and then runs a script from that repo. However, the pods I've launched with it have failed to clone the repo, much less execute the script.
Here is my container start command...
remote-ssh broken
Remote-SSH on VSCode is broken on community cloud instances, due to there not being PTY support on the ssh client (because it's not a public IP?). This was asked before, but not sure if it was fixed
Solution:
You can try this:
...
pip install OhMyRunPod
OhMyRunPod --setup_ssh
pip install OhMyRunPod
OhMyRunPod --setup_ssh
Networking on my pod has been shit for last 3 days. please fix. US region. RTX 6000 Ada
Going to try transfering my data to a new pod. Would be great if you could fix the networking. Keep losing connection.
Backend error
i get this error when i try to start my sd with my custom image, how to solve this error?
Solution:
It simply says that your option isn't available, pick from one of those
Custom Template with Jupyter not working
im trying to create a custom template that use jupyter, so im referencing the way jupyter installed and start from the official stable diffusion runpod image
https://github.com/runpod/containers/blob/main/official-templates/stable-diffusion-webui/Dockerfile
https://github.com/runpod/containers/blob/main/container-template/start.sh
and the result is the jupyter did started, but when i click the Connect button and "Connect to HTTP port 8888" button, it leads me to a login page of the jupyter notebook
i noticed that the "connect to jupyter" button for the official sd runpod will lead to jupyter notebook link with token as its params (ie xxxx-8888.proxy.runpod.net/lab?token=xxxxx), but with the runpod that runs my custom image, the button only lead to xxxx-8888.proxy.runpod.net...
Solution:
set
--ServerApp.token=""
if you don't want a passwordPod system error
I've been running this pod for over 6 months and suddenly it's having issues. Although it says the pod is "running", the system logs show this error repeatedly:
2024-06-10T12:28:12Z start container
2024-06-10T12:28:14Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: GPU-xxxxxxx: unknown device: unknown...
Fast loading of large docker image
Hello, I am trying to use a large docker image (>20GB) to start a pod.
Is there a way to cache it in a network volume, and then start a pod from it, to start a pod quickly (I already sent my image in .tar on a volume, but couldn't find how to start a pod from it)? Or is there a better solution? Thank you!...
0 gpu available
currently i can't assign a gpu to my pod. will i get the option to at a later point when some free up or is my pod doomed?
DATA LOSS IN EU-RO-1 - URGENT
Need someone to communicate ASAP this is literally sev0. We have all of our data and work backed in that network drive and out of the blue the files just disappeared.
Clarify RAM available
Hey there! Was thinking of using 4090 pods, and I saw that the deployment only had 60GB of RAM - which seems really low for a machine with 8 4090's.
However, in the filter section, it actually states that this is per GPU. It would be great to clarify that in the deployment section as well 🙂 - I'm sure others have been confused as I was....
Solution:
this should have been already fixed. Are you still having issues? It seems to work fine on my end (https://karalite.kaj.rocks/chrome_vCVXLLL3aN.mp4)
Create CPU Pod through GraphQL
The API seems to expect a gpuTypeId even when you specify gpuCount: 0. Is there currently any way of creating a cpu only pod with GraphQL or any other programmatic way? Thanks
Solution:
Hmm yeah try to reverse engineer it from your browser for now
How to add files?
I just setup a pod for the first time and I am trying to ssh into it to add some custom model files. I am getting this error and the connection closes. I dont really know much about ssh and this is my first pod.
```-- RUNPOD.IO --
Enjoy your Pod #oxb2cpeousjz ^_^
...
Very Slow Mapping
Hello! I am trying to run
dataset.map()
and it takes only a few minutes when I run it on Colab. However, when I run it on any machine on RunPod, it reports that it has several hours to finish. I reported this to the Support, but no solution yet. I wonder if anyone faced a similar issue, and how to solve it. The code below is for pre-processing an audio dataset for Whisper fine-tuning. Thanks!
```
def prepare_dataset(batch):
audio = batch["audio"]...GPU Pods in EU-SE-1 unexpectedly die after approximately 30 hours
We are experiencing many instances of GPU pods (mainly A6000) that stop working after 30 hours losing also the VRAM content.
We have repeatedly reported these issues but still there is not a solution since it keeps happening.
We have left a pod on (ID : cxquttq3m3kqvl) for you to debug, can you please help?
Thanks...
cpu instances don't work
2024-06-05T20:19:37Z create container runpod/base:0.5.1-cpu 2024-06-05T20:19:38Z 0.5.1-cpu Pulling from runpod/base 2024-06-05T20:19:38Z Digest: sha256:7530e77d6014bd6f3e1939b8d9003d8f7d2bd35a98395c4d297ac3b7a6d05b85 2024-06-05T20:19:38Z Status: Image is up to date for runpod/base:0.5.1-cpu 2024-06-05T20:20:38Z error creating container: container: create: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.43/containers/2fc1150401eeace7c2f58423e071f9686d6faaa89c28c7f50cf249b8b3f5ada4/start": context deadline exceeded...
Networking Multiple Pods Together
I'm looking to train a distributed model on runpod. When configuring the torch.distributed or jax.distributed you provide a
coordinator_address
of the form ip:port. Right now I'm unable to confirm that two pods can communicate with one another. I start one pod expose a 70000
level port, ssh into it, run ip route
to get the local IP, then start a simple python server python -m http.server 70000
. Then SSH into the other pod and run curl <pod_1_local_ip>:<pod_1_70000_port>
.
This consitently fails. My intuition is that the docker containers don't belong to the same network, to my knowledge we users don't have the privilege to setup such a network on the datacenters machine, only modify containers on a one off basis.
Any guidance on enabling communication between pods would be greatly appricieated!...Docker Image For RunPod Pytorch 2.0.1 Template
Hello,
I'm trying to create a custom template which just adds a daemon to the official RunPod Pytorch 2.0.1 template.
How can I find the docker Image that is deployed with this template?...