RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

ricardo_neurona

3/28/2024

Hi! Sometimes I can download models from Civitai, using wget. But other times, I can´t. Example:

https://civitai.com/api/download/models/130072?type=Model&format=PickleTensor&size=full&fp=fp16

Solution:

curl -LOJH "Authorization: Bearer xxxxxx" https://civitai.com/api/download/models/342732?type=Model&format=SafeTensor&size=pruned&fp=fp16

curl -LOJH "Authorization: Bearer xxxxxx" https://civitai.com/api/download/models/342732?type=Model&format=SafeTensor&size=pruned&fp=fp16

you would need generate own API key and replace xxxxxx...

Olav

3/28/2024

Kernel version discrepancy between Pods.

I have rented several GPU pods with 4090's and I sometimes get warnings about the kernel version being too low. So the host system is not running the same Linux kernel version. How can this be fixed? I am using this base docker and it works 90% of the time, but when I get the error I can start a new Pod and not get the error.

nvidia/cuda:12.1.0-devel-ubuntu22.04

nvidia/cuda:12.1.0-devel-ubuntu22.04

...

prefontaine

3/27/2024

Whatever I do, the ports do not open for the service

1. There is nothing in the logs that indicates something is wrong (attached screenshot) 2. I've tried multiple images and GPU types....

krytie

3/27/2024

API to query Pods

Is there an API available to query our pods and the utilisation on each pod?

krytie

3/27/2024

Exposed Port 8888

I'm not running Jupyter, but I've left 8888 exposed as I'm running another service on that port. However I cannot connect to 8888 remotely, only locally on the machine. Is there any other setting I need to configure?

AMooMoo

3/26/2024

Question about Pods and data

Hi! I have a quick question for pods regarding how data on them works. Let's say I use the ComfyUI stable diffusion template, but I want to add some models, so I go into the pod, and I add some models in whatever way, whether through CLI or the ComyUI manager thing. If the pod goes down or some kind of interruption happens, do I lose my custom models when the pod restarts?...

papanton

3/26/2024

Availability of A40, A6000

What is the region with the highest availability of the above GPUs? Looking ot deploy an endpoint and want to ensure minimum throttle.

SBlack

3/26/2024

Slow CPU

Hi, I'm facing a big problem! On A100, L40 graphics cards in Secure cloud I am experiencing very low CPU performance On 3080, 3090ti in community cloud CPU speed is quite good...

jumblejumble

3/26/2024

slow GPU across many community cloud pods

just today I am having an issue with stable diffusion speeds anywhere from 4 it/s to 5 s/it, mostly the slow side. I'm on the third pod I've tried - two 3090s and now a V100, it's made no difference and it's making it unusable. Is there something silly that I might have done to cause this, since it doesn't seem to be the fault of the pod? ashleyk...

Nora Belrose

3/26/2024

CPU Pod with shm size larger than physical RAM

I would like to memory map a large file that is larger than the available physical RAM. IIUC this requires changing the size of /dev/shm. What is the best way to do this in RunPod?

Solution:

You can’t do it

j2k8

3/25/2024

With a custom template true ssh ask for a password, proxy ssh works perfectly.

I have created my custom docker image from one of the official docker images of pytorch 2.2.0. I can connect with proxy ssh, but when connection with true ssh my computer offers the correct key but the servers ask for passwork anyway. I can connect through the proxy ssh and everything is fine. The public key is authorization file in the root/.ssh directory. Sshd config seems the same as in other pods. I am using the same start.sh script as in other instances. I see in the log that the script is...

Solution:

https://pypi.org/project/OhMyRunPod/

vinodt

3/25/2024

multiple nodes

Hello, it is possible to get multiple H100 SXM5 nodes for a multi-node run?

jpcanesin

3/25/2024

Can't access pods after network outage

Two of my pods say that they've suffered a network outage and now I can't access them, it keeps getting stuck on startup with the message "Waiting for logs". Are these pods unreachable? How long should I wait? Is there any way I can retrieve the mounted volumes? ID: rob1e5oebdvrsa ID: g26eqo9wacdosd...

Solution:

@Papa Madiator Got access to the pods again! Thanks!

Chilistick

3/25/2024

wget doesnt work on civitai models

I tried to use wget with link to download but it says unauthorized acess and nothing happens. It was a 5gb SD1.5 model. How do i fix this?

MarkaRagnos

3/24/2024

0 x 4090

Hello all, I'm sure this has been asked somewhere but I can't seem to find anything on this. I had a pod with 1 x 4090 which is now at 0. Is there a chance that I will ever be able to deploy that pod with a gpu again or to transfer the data on that pod to a new pod? I'm new to this so I'm not entirely sure how this system works.

Solution:

You can wait for the GPU to become available again, but that could take weeks or months, so better to create a new pod. RunPod allows you to start your pod with 0x GPU so that you can transfer your data to a new pod. You can also use network storage so that you don't have to worry about transferring your data in the case that someone else takes the GPU.

Furkan Gözükara SECourses

3/24/2024

A New Gold Tutorial For RunPod & Linux Users : How To Use Storage Network Volume In RunPod & Latest

A New Gold Tutorial For RunPod & Linux Users : How To Use Storage Network Volume In RunPod & Latest Version Of Automatic1111 With All ControlNet Models, InstantID & More : https://www.youtube.com/watch?v=8Qf4x3-DFf4

caseus

3/24/2024

Linux kernel version is 5.4.0

per accelerate: https://github.com/huggingface/accelerate/blob/85a75d4c3d0deffde2fc8b917d9b1ae1cb580eb2/src/accelerate/utils/other.py#L314C1-L331C1 ```...

Ercan

3/24/2024

How to scale pod GPU count properly?

Hello, we have some pods running with 2x4090, and what is the best way to increase this to e.g 4x4090 without making sure that our existing allocated gpus will not be taken by others even if we are running on-demand?

MarkyMc

3/24/2024

distributed training

Is it possible to set up a slurm cluster for distributed training on Runpod?

cinebam

3/24/2024

How can i bulk download all my images generated in my Output Folder

How can i bulk download all my images generated in my Output Folder (Fooocus)? I'm in the Jupyter Lab Browser but can only download single files. It is not possible to download folders or Zip all files and then download them!...

Previous Next

Gaming

Programming

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!