RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

hi_lucky

8/15/2024

cloud sync fail

Syncing to Dropbox failed; it always shows: Something went wrong! some detail:...

mmo2112

8/15/2024

Can not start docker container

I use custom docker image: Here is system log: 2024-08-15T04:53:11Z start container Here is container log: 2024-08-15T04:52:55.667454224Z /usr/local/bin/docker-entrypoint.sh: line 414: exec: docker: not found SSH to this pod response Container not running....

Bitmap

8/15/2024

libcudnn.so.9: cannot open shared object file: No such file or directory

Getting this error when using the CUDAExecutionProvider with onnxruntime-gpu. I'm building the container for cuda 12 and installing onnxruntime-gpu 1.18 directly from microsoft's package index to fully support cuda 12. nvidia-smi works inside the container. not sure why im getting the issue.

youcef

8/15/2024

Can't access pod

it's been down over 16 hours, would be great if this can be dealt with ASAP. Stuck on Waiting for logs if I try to turn it on

dynafire

8/14/2024

Multiple containers on a single GPU instance?

Are there any plans to allow multiple docker containers on a single GPU instance? I have workloads which do not utilize the full resources of a single GPU, and I'd like to be organize the workloads using multiple containers sharing a single GPU. I don't believe there is a way to do this currently, the closest is to run multiple processes inside a single docker container, but that is a docker anti-pattern and not very good for workload organization.

Rfoxes

8/14/2024

Connecting Current Pod to Network Volume

Hello, is there a way to connect a current pod to a network volume, or would I have to transfer all the data into a network volume and set up a new pod. If that is the case, what's the fastest way to do that (I have a large dataset I would have to move)?

leduyson2603

8/14/2024

Weird error when deploy lorax inference server

Hi guys, i'm trying to deploy the lorax inference server on runpod A100 PCIe pod. I got a very weird error attached in the image. Why the error is weird? Because it only happened for some pods but not all, do you guys know any reason about this?

liuxdong

8/14/2024

Waiting for logs Unable to initialize, it keeps initializing,Unable to initialize, it keeps initiali

Restarting the pod will clear all data

blankspace

8/14/2024

Passwordless SSH doesn’t work half the time.

I’m using pods in the secure cloud. Half the time, I can’t SSH in and it asks for a password. My key is in authorized files, all the settings for the ssh server are right, but it won’t accept my key. Debug logging gives no reason why. The template is a standard PyTorch 2.2 template from RunPod. The only thing I can do is set a root password and allow using it for SSH and enter my password every time, which is very annoying. Happens all the time and then every now and then it doesn’t and I can SSH in fine without a password. Nothing different on my end. Same template, same scripts doing the login. ...

Mariano

8/13/2024

Flux in Runpod Stable Diffusion WebUI Forge doesn't work in Runpod, although it seems to be possible

I've seen your tutorial to run Flux in Runpod in your blog, but it doesn't work for me, got many errors that I can't solve, I'm not a programmer, sorry 😦 I would like to install Flux in Forge. Why it doesn't work in the version is running in Runpod, is it going to be possible...

Thibaud

8/13/2024

vllm seems not use GPU

i'm using vllm and on the graph, when i launch some request, only cpu usage increase. if i open a terminal and launch nvidia-smi, i didn't see any process too. settings line...

Chilistick

8/12/2024

Updated a1111 and now i cant connect to the webui port

used git checkout master and git pull in the terminal to update and now i cant connect the port. im getting a 502 | README | Runpod. I already tried deleting the venv and waiting 30 min; no luck. using the official runpod a1111 template

antoniocosta.eu

8/12/2024

Pod resume failed: This machine does not have the resources to deploy your pod.

Hello! I'm getting this error: Pod resume failed: This machine does not have the resources to deploy your pod. Please try a different machine My pod is a RTX 3090 , 10GB container disk and 60gb volume disk. How can I prevent this from happening?...

Flynn

8/12/2024

Help! My Port 3000 (A1111 web-ui) isn't starting up.

i'm using the ashleykza/a1111 template. It's been working fine till today when I uploaded some new LoRAs.

utmostmick0

8/12/2024

Can't update custom nodes ComfyUI

New install , update comfy , try to update comfy manager , nothing happens , What am I doing wrong ?

lil_xiang

8/12/2024

pod with custom template have no tcp ports exposed

Hi, I just created my custom template, and I set the ports to be exposed in the template, but after I deploy a pod, it has no ports exposed, did I configure something wrong?...

vicentecarro

8/10/2024

IS disk slow

IS(1 ,I think) disk speed is going at 658 MBps while others like US-OR are going at +4000 MBps.

antoniocosta.eu

8/10/2024

Community runpod template error (Comfyui ashleykza)

I'm trying to deploy this community runpod template Comfyui Ashleykza, but I'm getting this error. How can I proceed?

Solution:

so runpod/comfyui? Cannot find that. one But I found aitrepreneur/comfyui:2.3.5. Testing it now....

Malagant

8/9/2024

Syncing taking too long?

Hi everyone. I'm using ULTIMATE Stable Diffusion Kohya ComfyUI InvokeAI pods. It works well yesterday, but when I tried to create it again today, it stuck on the sync of A1111 (image attached). I've wait a while for this to go through but no dice. I did this in the secure cloud. However when I tried using community cloud, the syncing went fine. Anyone knows what's happening?...

Quicksilver

8/8/2024

How to store Model to Network Volume

I am saving my Huggingface model with save_pretrained. Which base path do I pass here so that model is saved to Network Volume instead of Container Disk...

Solution:

It is set in the Template. The default mounts to /workspace Often the best way to accomplish storing models there is to create a symbolic link into /workspace...

Previous Next

Gaming

Programming

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!