free credits
Hi can i get 1 hour free credit with 24 gb GPU for test if my script work? If yes i will buy credit.
Empty Root - No workspace on Exposed TCP Connection
I have just created a connection over exposed tcp for the first time and finally got to ssh into my machine. However when I ls my actual installlation, nothing is there. It is frustrating as I am used to the "workspace" folder that is needed in order to save files between uses of the machine. Did I miss something in the setup, or is this how it is supposed to be?
Disk quota exceeded
I have disk free ~ 19 GB on workspace in my pod but stil I am getting disk quota exceeded. Any leads pls. THanks in advance
How to exclude servers in planned maintenance?
I'm preparing the production environment for our release this weekend. When I pick 4 x RTX 4000 ada I end up with a server that is flagged for maintenance in the coming days. Is there a way to exclude servers that are planned for maintenance?
Thanks...
Run multiple finetuning on same GPU POD
I am using
- image: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
- GPU: 1 x A40
While running qlora finetuning with 4 bit quantization the GPU uses approx 12 GB GPU Memory out of 48 GB, how can I run multiple finetunings simultaneously (in parallel) on the same POD GPU?...
Problem connecting to ComfyUI
I'm running the Stable Diffusion Kohya_ss ComfyUI Ultimate template on an RTXA500, pod ID: uj3551nw4ul5l9
The pod seems to start fine, and allows me to connect to all the ports (including JupyterLabs port 8888) except for ComfyUI port 3020. I've attached screenshots of every relevant detail I could think of.
Thank you!...
Solution:
your volume is full and it might cause issues
SD ComfyUI unable to POST due to 403: Forbidden
As i used ComfyUI locally, there was no problem, but when im using my Pod as Backend im trying to POST through flask on https://|[id]-3000.proxy.runpod.net im always recieving "ERROR in app: Error during placeholder: HTTP Error 403: Forbidden"
Is that even possible? Is there another way of doing that?
In my flask app.py im trying to do this:
ws = websocket.create_connection(f"wss://{server_address}/ws?clientId={client_id}")
server adress would be [id]-3000.proxy.runpod.net...
Solution:
Ok, I fixed It. I just had to change the exposed Port from http to tcp and access it via the open IP plus the port
What is the recommended GPU_MEMORY_UTILIZATION?
All LLM frameworks, such as Aphrodite or OobaBooga, take a parameter where you can specify how much of the GPU's memory should be allocated to the LLM.
1) What is the right value? By default, most frameworks are set to use 90% (0.9) or 95% (0.95) of the GPU memory. What is the reason for not using the entire 100%?
2) Is my assumption correct that increasing the memory allocation to 0.99 would enhance performance, but it also poses a slight risk of an out-of-memory error? This is paradoxical, as if the model doesn't fit into RAM, it is expected to throw an out-of-memory error. I have noticed that it is possible to get an out-of-memory error even after the model has been loaded into memory at 0.99. Could it be that memory usage can sometimes exceed this allocation, necessitating a bit of buffer room?...
Solution:
0.94 works
Install Docker on 20.04 LTS
hello all,
trying to run containers on docker on a pod with ubuntu 20.04.
after docker install and running the "hello world" docker test i get this error:
docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?....
Solution:
Pods are already docker containers, you cannot run docker inside of docker
Pod GPU assign issue
Recently I started noticing that sometimes any new pod initialising is stuck at this step. Sometimes it works, sometimes it won't. Anyone else facing this?
---------stdout------
Unable to determine the device handle for GPU0000:08:10.0: Unknown Error
---------stderr------...
Pod Unable to Start Docker Container
I've tested this Docker image on my local computer and other servers, however on Runpod it seems to be stuck in a loop displaying "start container". Is this an issue others have encountered before?
How can I install a Docker image on RunPod?
I had a chat with the maintainer of aphrodite-engine and he said I shouldn't use the existing RunPod image as it's very old.
He said there is a docker that I should utilise: https://github.com/PygmalionAI/aphrodite-engine?tab=readme-ov-file#docker And here is the docker compose file:...
He said there is a docker that I should utilise: https://github.com/PygmalionAI/aphrodite-engine?tab=readme-ov-file#docker And here is the docker compose file:...
CPU Only Pods, Through Runpodctl
Heyo! Is there a way to create cpu only pods through runpodctl? I don't see a flag for cpu type, but rather number of vCPU's and GPU's
Solution:
It is not yet supported currently
Unable to create template or pod with python sdk version 1.6.2
```python
import runpod
import os
...
Solution:
You called your script runpod.py, so it conflicts with the runpod module. You can't do that, give your script a different name.
Pod unable to read environment variables set in templates caused a loss
Hi, this issue has caused us to create over 70 pods that are running idle, the pods did nothing.
n00b multi gpu question
Hello hello!
I created a 4 gpu pod (screenshot), then asked pytorch what devices it saw, and it just saw one - what's the dumb thing i'm missing?
Thanks 🙂...
Solution:
Alright so, I restarted the pod (with the env var you suggested) and CUDA reported zero gpus
Then I removed the env var, restarted, and CUDA now reports four GPUS. no change from previous code/config
Either:...
runpodctl not found on pod
I wanted to run some tests. This involves a pod stopping itself after executing a task. To do this, I execute some work and then call
runpodctl stop pod $RUNPOD_POD_ID
inside the container from a bash script. This works in my actual production container, but it doesn't work in my test environment. The pod says that runpodctl can't be found (2024-06-11T13:56:58.504874269Z ./run.sh: line 11: runpodctl: not found
). Even after letting it run for a while, it can't ever find runpodctl. Any idea what I can do about this?
Here's a very minimal Dockerfile:
```
FROM alpine ...Solution:
runpodctl won't be installed on the alpine image by default