RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

free credits

Hi can i get 1 hour free credit with 24 gb GPU for test if my script work? If yes i will buy credit.

Can I run docker desktop on

runpod desktop? I have try long time I cannot run it

Empty Root - No workspace on Exposed TCP Connection

I have just created a connection over exposed tcp for the first time and finally got to ssh into my machine. However when I ls my actual installlation, nothing is there. It is frustrating as I am used to the "workspace" folder that is needed in order to save files between uses of the machine. Did I miss something in the setup, or is this how it is supposed to be?

Disk quota exceeded

I have disk free ~ 19 GB on workspace in my pod but stil I am getting disk quota exceeded. Any leads pls. THanks in advance

How to exclude servers in planned maintenance?

I'm preparing the production environment for our release this weekend. When I pick 4 x RTX 4000 ada I end up with a server that is flagged for maintenance in the coming days. Is there a way to exclude servers that are planned for maintenance? Thanks...

Run multiple finetuning on same GPU POD

I am using - image: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 - GPU: 1 x A40 While running qlora finetuning with 4 bit quantization the GPU uses approx 12 GB GPU Memory out of 48 GB, how can I run multiple finetunings simultaneously (in parallel) on the same POD GPU?...

Can I download audit logs?

Is there a way to fetch or download audit logs?

Problem connecting to ComfyUI

I'm running the Stable Diffusion Kohya_ss ComfyUI Ultimate template on an RTXA500, pod ID: uj3551nw4ul5l9 The pod seems to start fine, and allows me to connect to all the ports (including JupyterLabs port 8888) except for ComfyUI port 3020. I've attached screenshots of every relevant detail I could think of. Thank you!...
Solution:
your volume is full and it might cause issues
No description

SD ComfyUI unable to POST due to 403: Forbidden

As i used ComfyUI locally, there was no problem, but when im using my Pod as Backend im trying to POST through flask on https://|[id]-3000.proxy.runpod.net im always recieving "ERROR in app: Error during placeholder: HTTP Error 403: Forbidden" Is that even possible? Is there another way of doing that? In my flask app.py im trying to do this: ws = websocket.create_connection(f"wss://{server_address}/ws?clientId={client_id}") server adress would be [id]-3000.proxy.runpod.net...
Solution:
Ok, I fixed It. I just had to change the exposed Port from http to tcp and access it via the open IP plus the port

What is the recommended GPU_MEMORY_UTILIZATION?

All LLM frameworks, such as Aphrodite or OobaBooga, take a parameter where you can specify how much of the GPU's memory should be allocated to the LLM. 1) What is the right value? By default, most frameworks are set to use 90% (0.9) or 95% (0.95) of the GPU memory. What is the reason for not using the entire 100%? 2) Is my assumption correct that increasing the memory allocation to 0.99 would enhance performance, but it also poses a slight risk of an out-of-memory error? This is paradoxical, as if the model doesn't fit into RAM, it is expected to throw an out-of-memory error. I have noticed that it is possible to get an out-of-memory error even after the model has been loaded into memory at 0.99. Could it be that memory usage can sometimes exceed this allocation, necessitating a bit of buffer room?...
Solution:
0.94 works

Install Docker on 20.04 LTS

hello all, trying to run containers on docker on a pod with ubuntu 20.04. after docker install and running the "hello world" docker test i get this error: docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?....
Solution:
Pods are already docker containers, you cannot run docker inside of docker

Pod Network Issue Stuck

Pod id: ul5cbpu7iavded

Pod GPU assign issue

Recently I started noticing that sometimes any new pod initialising is stuck at this step. Sometimes it works, sometimes it won't. Anyone else facing this? ---------stdout------ Unable to determine the device handle for GPU0000:08:10.0: Unknown Error ---------stderr------...

Pod Unable to Start Docker Container

I've tested this Docker image on my local computer and other servers, however on Runpod it seems to be stuck in a loop displaying "start container". Is this an issue others have encountered before?
No description

How can I install a Docker image on RunPod?

I had a chat with the maintainer of aphrodite-engine and he said I shouldn't use the existing RunPod image as it's very old.
He said there is a docker that I should utilise: https://github.com/PygmalionAI/aphrodite-engine?tab=readme-ov-file#docker And here is the docker compose file:...

CPU Only Pods, Through Runpodctl

Heyo! Is there a way to create cpu only pods through runpodctl? I don't see a flag for cpu type, but rather number of vCPU's and GPU's
Solution:
It is not yet supported currently

Unable to create template or pod with python sdk version 1.6.2

```python import runpod import os ...
Solution:
You called your script runpod.py, so it conflicts with the runpod module. You can't do that, give your script a different name.

Pod unable to read environment variables set in templates caused a loss

Hi, this issue has caused us to create over 70 pods that are running idle, the pods did nothing.

n00b multi gpu question

Hello hello! I created a 4 gpu pod (screenshot), then asked pytorch what devices it saw, and it just saw one - what's the dumb thing i'm missing? Thanks 🙂...
Solution:
Alright so, I restarted the pod (with the env var you suggested) and CUDA reported zero gpus Then I removed the env var, restarted, and CUDA now reports four GPUS. no change from previous code/config Either:...
No description

runpodctl not found on pod

I wanted to run some tests. This involves a pod stopping itself after executing a task. To do this, I execute some work and then call runpodctl stop pod $RUNPOD_POD_ID inside the container from a bash script. This works in my actual production container, but it doesn't work in my test environment. The pod says that runpodctl can't be found (2024-06-11T13:56:58.504874269Z ./run.sh: line 11: runpodctl: not found). Even after letting it run for a while, it can't ever find runpodctl. Any idea what I can do about this? Here's a very minimal Dockerfile: ``` FROM alpine ...
Solution:
runpodctl won't be installed on the alpine image by default