RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

same GPU, different machine -> different speed

The image shows 2 yolo object detection runs with identical setup (same batch size, image size, number of epochs) on 2 different runpods. The GPU was in both cases the RTX 4090 slow machine +---------------------------------------------------------------------------------------+...
No description

Kohya port not working

i'm trying to lunch kohya tried everything nothing work even used the command tail -f /workspace/logs/kohya_ss.log...
Solution:
You only need to chat in 1 place, not multiple places, you have already been answered in #🎤|general

runpodctl -> get public IP + exposed ports

Lets say I create a new pod using runpodctl create pod --name 'Whatever' \ --imageName 'runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04' \ --gpuType 'NVIDIA GeForce RTX 3070' ...
Solution:

This pod suddenly came into my account ( i didnt create it )

vi9vaz7fu77b52 Thats the pod id, already deleted it. I think because of vllm workers / template?...

pod has no public ip

A pod has no public ip despite me clicking on the "public ip" checkmark

can I deploy flask, celery, redis, postgreSQL on runpod?

Hi, as you know the pod only persist data under /workspace folder. for all python related packages I can use venv to put all the data and configuration under /workspace. while if I need to install all the tools like flask, celery, redis, postgreSQL they are not python installation, the configuration files will be scattered here and there. all these file and configuration will disappear after pod restart. ...
Solution:
You can install whatever you want but I don't recommend installing databases etc on RunPod. Its better to deploy those things to a CPU cloud provider and use RunPod serverless for offloading tasks that need to run on a GPU.

CudaToolkit >= 12.2

When selecting the POD to deploy, I can filter the GPU supported cuda version up to v12.4. I suppose this refers to the CUDA display driver, right? The runpod base images however, only provide up to "cuda 12.1.1" which is not the driver- but the cuda toolkit version, correct?...
Solution:
You have two types of CUDA One that shows from nvidia-smi with is max cuda version supported by host. Version from nvcc --version is one bundled with template ...

why don't I have a stop option, only terminate option available

Solution:
I would use rclone rather than cloud sync. Cloud sync is built on stop of rclone anyway.
No description

are network volumes slower than "normal" volumes?

Hey everyone! I've been experimenting with network volumes because of their perk of not needing to reinstall everything whenever my pod 'looses' its GPU. However, I've noticed that the upload/download speeds are pretty slow every time I use them. Has anyone else experienced this? Do these volumes need a few hours or days to reach optimal performance, similar to AWS? I'd really appreciate any insights or experiences you might have!
Solution:
Its accessed over network and not directly attached to the machine.

cannot find my network volume in the running ubuntu pod.

Hi I have ceated pod with the network volume of 300GB. it is shown in the pod details. but when I logon to the pod, run command "df -h" I cannot find the network volume attached with the running pod. please help.
Solution:
The workspace one
No description

Apply a fix public ip and attach to the running pod, Attach a network volume to the same pod.

Hi, I am new user of runpod. I have one pod running. but I cannot find and place to apply for a fix public ip and attach to the running pod. also I need to put the data on a persistent storage, that is why I have created a network volume. but I cannot find any where to attach it to my pod. I think this is very basic requirements that majority of the user will need. it must be somewhere in the document. but unfortunately I did not find the answer in the document either. please help, thx...
Solution:
1. You cannot attach a public IP to an existing pod. In Secure Cloud all pods should have a public IP by default. In Community Cloud, you need to check the filter at the top of the page before deploying your pod. 2. You cannot attach a network volume to an existing pod. You either need to click the Deploy button from the network storage to deploy a new pod with it attached, or alternatively select it from the filter at the top of the page in Secure Cloud before deploying your pod. Basically seems like you are not using any of the available filters....

graphql Unauthorized

When I perform the "myPods" query [https://graphql-spec.runpod.io/#query-myself looks similar] with the "machines" field, I receive a strange output: ``` { "errors": [ {...
Solution:
The solution was given on Slack I had to use this query not like
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
"query myPods {\n myself { pods {\n desiredStatus \n dockerId\n id\n imageName\n lastStatusChange\n locked\n machineId\n name\n machineType\n templateId\n uptimeSeconds\n }\n machines { id } }\n}"
...

Help needed with Docker Installation

Hey guys, how can I install docker within an ubuntu containers. I tried but I am unable to run.
Solution:
Not possible on RunPod, you will have to do it somewhere else, like AWS etc.

Update image runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 ?

I use a conda environment with pytorch for weeks and this worked perfectly on the runpod container - until today. Now pytorch doesn't work anymore. Was something changed on the image?

Questions About Resuming GPU Sessions

Hi, so I saw this message when I paused my session. It said, "You can start the pod later, but it's not guaranteed to be available." So, if I wanna start it up again and the GPU I was using isn't available, will it set me up with another 4090? Or do I have to wait until the one I was using is available again?
Solution:
Wait until available or just start with cpu only
No description

"The port is not up yet"

Having problems again, I created a new pod about 1 hour ago. it took me 1 hour to cloud sync, and now the pod will not run anything. I have tried to restart a couple of times, but always get this error message...

Change disk volume

Apologise for a newbie question, is it possible to change the size of the existing persistent disk volume or is it necessary to create a new one and transfer data from the old one? Thank you.
Solution:
It is possible, no need to try it first.

There is no pod available

Hi!, all GPU Pods, whether secure or community are unavailable, no matter what filter you use. What's going on? Edit: Now it seems to be working, but the page is taking a long time to load, is there any maintenance work going on?...
Solution:
There is no maintenance otherwise everyone would be affected and not just you. Sounds like an issue with your internet connection.

wget not working inside the terminal for stable diffusion webUI

When I try to run the wget command to get models from civitai, it throws an error about username and password. I've watched many videos about it, and I seem to be doing everything right but I still can't get it to work

RTX 6000 Ada performance much worse than expected

From the NVidia specs, I would expect its performance to be on order of 10 - 20% slower than L40S. However, in my current training, I am finding it closer to 2X slower or worse. FP16 mixed precision training. Pretty bad considering price. Perhaps there is some other issue in how the pods or nodes are set up that could be worth looking into?