RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

My pods are missing, but still charge me everyday

The management page shows empty, I can NOT find it, is it a bug?
No description

Network issue?

[Error -3]Temporary failure in name resolution

Pod running but inaccessible

Have 2 seperate pods connected to a network drive. Pod#1 is accessible and logs to a log1 file, Po2#2 logs to a log2 file. Pod#2 is not accessible via ssh and is stuck at the screenshot. But, have confirmed by accessing the pod#2 log via Pod#1 that it is still actually running.
No description

instances available A100 80GB

trying to deploy A100 80GB. i keep getting "there are no longer any instances available with enough disk space" no matter what container/volume disk sizes i set... have any of you run into this? if so, how did you get past it?
No description

https://www.runpod.io/console/pods keeps reordering servers

this is EXTREMELY infuriating. I keep accidentally deleting the wrong server because it reorders them for either no reason at all or when you start/stop one, etc

A1111 wont find my files

I'm trying to use the batch function in img2img. I've prepared all necessary folders and files. But, when I click on "run", nothing happens. It acts as if the path is wrong. I've tried all sorts of formatting, to put the folders in many different places, and nothing gave results....
Solution:
You used the correct directories in colab but not RunPod so obviously it works in colab.
No description

ngc tritonserver container image not usable?

I tried to create a pod on a server with cuda >= 12.2 using this image: nvcr.io/nvidia/tritonserver:24.01-trtllm-python-py3 it loads up correctly, but the resulting server is not usable, cannot connect ssh (the window immediately closes after typing passphrase). the same image works fine on servers from vast.ai, what's the issue?...

"Too many open files in system"

I am using many cpu3c-2-4 in RO region, all working off of the same volume and keep running into "Too many open files" error. Error only happens in CPU pods, and only when many different pods are working with many different files, such as large apt-get installs and large tar gzips. I have tried setting ulimit -n [LARGE_NUMBER], but this does not fix the error. Any ideas?...

What the fuck is going on again with US - 1 x H100 80GB SXM5

"We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime." I have been using runpod and every fucking day is something wrong!? ID: x1vidmyoiu3a06...

GPU runpod critical error detected

"We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime." ID: pris741sxxrz2d...

stable diffusion - how do I view the active log?

When you launch stable diffusion in local, you have a DOS window with a log that gives infos on all actions taken. But on runpod, the terminal shows the sequence of initialization, and then it stops recording after the 1st model is loaded. Why is that, how can I see the current state? I currently need it to try understanding how to open files on my network storage, but it's an useful tool in general that I know I'll need a lot later....
Solution:
Read the README for instructions on how to see the logs. You can't really use launch.py because the pod already starts it and looks like you also didn't activate the venv first, if you want to do that you basically have to set the DISABLE_AUTOLAUNCH environment variable, again see the README.
No description

Pod using CPU instead of GPU

Title. Trying to run deforum on Ashley's ultimate template. However, normal txt2img works fine....
No description

After tying the service for the first time, out of funds because of a stale pod after disconnecting

Hello. As per the title. I'm a professional comic book artist working with Krita and trying to stay competitive in a difficult market. After finding out about Krita's AI plugin and it capabilities to assist with coloring and finishing sketches and drawings I decided to try it in preparation for a big project. Lacking a powerful PC at the moment I followed the recommendations from the Krita team and tried your services. After a bit of hassle with the set up I signed up with you, added 10 euros in...

pod does not show public ip & ports

we have a template with tcp port configured when we deploy that template using community cloud with public ip filter set to true or the secure cloud In both cases we do not get a pod with a public ip & port its just showing:...
Solution:
okay turns out the pod was not running since there was no entrypoint / command configured was not transparent that the pod is actually not running, since it did not show as exited however, solved by providing such...

Pod is unable to find/use GPU in python

Hi, I'm trying to connect to this pod: RunPod Pytorch 2.2.10 ID: zgel6p985mjmmn...
Solution:
@Dhruv Mullick I don't think it has to do with the image... If you select it from the runpod website, there is a filter button at the top and then a drop down menu where you can select 12.2 as "Allowed CUDA Versions" as @ashleyk pointed out earlier 'the machine is running CUDA 12.3 which is not production ready'. if I select 12.2 it works....

Pod is stuck in a loop and does not finish creating

Hi, I'm trying to start a 1 x V100 SXM2 32GB with additional disk space (40 GB). It worked fine until yesterday. now when I'm trying to create it gets stuck in this loop: ```...

Runpodctl in container receiving 401

Over the past few days, I have sometimes been getting a 401 response when attempting to stop pods with runpodctl stop pod $RUNPOD_POD_ID at the end of my jobs. This is causing the container to restart on exit rather than stop. Do the credentials passed to the container expire?
Solution:
ok. so any pods created before the migration will fail when stopping via runpodctl

Cannot establish connection for web terminal using Standard Diffusion pod

I'm able to connect to the Webui HTTP client. And I can connect via SSH from my local machine AND I can connect to the Jupiter notebook no problem, but when i start the web terminal and attempt to connect, it brings up the black screen but then immediately says "connection closed".

Runpod errors, all pods having same issue this morning. Important operation

I got this error on all my pods today We have detected a critical error on this machine which may affect some pods. We are looking into the root cause and apologize for any inconvenience. We would recommend backing up your data and creating a new pod in the meantime.