Build with Dockerfile or mount image from tar file
Is there a possibility to build a image from dockerfile trough runpod or mount my tar file?
Performance of Disk vs Network Volume
Is there a significant trade-off in performance between the pod's local volume and a network volume? How should I think about this?
Runpod's GPU power
Does Runpod's gpu share? I need a GPU with 100% power for training
Solution:
The GPU is dedicated to you, they are not shared.
Error when trying to Load "ExLlamav2"
I haven't used Runpod in a while but I'm pretty sure I used this one before, but somehow it's not working
NVENC driver conflict
Trying to use accelerated ffmpeg on a pod that has worked on pods before. Getting attached error even though driver is correct version.
```Mon Apr 22 03:52:28 2024
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+...
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+...
Is it all pods based on Docker?
I want to work on the original Ubuntu system or KVM, also get the public IP of the machine. Is it possible? It seems that the container image all are dockers.
Comfyui runpod don’t save workflow
Hi, I’m having a problem, when I stop the pod with comfyui and I run it again my workflow disappears
HTTP service [PORT 7860] Not ready
Like the title says, the http service isn't ready for me to connect when I'm trying to run TheBloke Local LLMs One-Click UI template. I'm using the A100 GPU with 100GB Disk and 100GB pod volume. its usually lets me connect after a few minutes and its been more than this.
I'd like to run a job that takes 8x GPUs.. any way I can increase the spend limit?
I'd like to run a job that takes 8x GPUs.. any way I can increase the spend limit?
Suddenly cannot boot SD Pod having trouble with "Could not load settings"
full error message:
2024-04-20T12:41:36.193868880Z *** Could not load settings
2024-04-20T12:41:36.194453815Z Traceback (most recent call last):
2024-04-20T12:41:36.194479595Z File "/workspace/stable-diffusion-webui/modules/launch_utils.py", line 244, in list_extensions
2024-04-20T12:41:36.194485685Z settings = json.load(file)...
4xH100 pod is stuck -- can't restart or stop
I am still connected with SSH, but the pod can't be used due to some network issues. RunPod UI also can't reach it (it shows waiting for logs).
Over night the pod failed with:
```...
Can't open folder in Jupyter Lab
I've been generating some video in ComfyUI and now want to open the output folder in JupyterLab to download them. The problem is I just can't open it, nothing happens when I click it. Is there another way to display the files or open the folder? Downloading the whole folder doesn't make sense since there's a lot of files in there.
CPU seems extremely slow
Hello, I created a pod with an A40 and 16vCPU. I'm trying to train a deep learning model, however, I'm stuck at data preprocessing as the CPU seems to be very slow, this step takes 10 times longer than on my computer with an i5 13th gen which seems very surprising to me.
The CPU usage is 100% on the dashboard when I run this step, which is as well surprising....
getting a pod's port mapping..
Is there any way to use an API to find out the port mappings of the externally exposed ports for a particular pod?
Megatron Container Image Setting
Hi. I want to use 'nvcr.io/nvidia/pytorch:24.03-py3' image for using the megatron.
My start command is 'docker run --gpus all -it --rm -v /:/workspace/megatron -v /:/workspace/dataset -v /:/workspace/checkpoints nvcr.io/nvidia/pytorch:24.03-py3'
However, I have trouble with starting the pod....
Network Volume and copying data between pods.
Hey folks,
I've had a pod with 3 GPUs and 1TB under /workspace mounted (not a network volume).
All my GPUs are gone without me stopping the pod (ID: jlqfqkgo7sd8h5).
1) It wasn't a spot instance, it was an on-demand one. Why would that happen?
2) I can't allocate more GPU to the existing pod, so I'm forced to create a new one....
How does runpod work with custom docker images? Multiple questions.
I have some questions:
1. If I use my own dockerhub image, does it have to pull the image from dockerhub everytime?
2. I tried to use a community template (ComfyUI - AI-Dock) and the pull from ghcr is very slow. This is related to the first question. Is there something that affects the pull speed? This sucks because I still get charged as I wait for my image to be downloaded. This image takes about 40mins to download ~5Gb while other take couple of mins for 5GB. So its not a problem on my end.
3. Are there any workarounds for pulling an image everytime? or bumping the pull speed? Can I request an add to cache for this image? I will be using it for now on.
4. How to use credentials registry? What do I set for password? My docker account pass or I generate a token with docker?...
My first template
I am trying to create my first template: https://runpod.io/console/deploy?template=i6ipm7ovin&ref=jndc8ozi
It's based on a docker image here: https://hub.docker.com/r/nschle/tabbyapi/tags
When filling the template like so, ......
Solution: works and reveals the error, a typo in the tag (missing character "v")...
docker pull nschle/tabbyapi:v0.0.1
docker pull nschle/tabbyapi:v0.0.1
is AWQ faster than GGUF ?
In which order is the fastest inference speed between AWQ, GGUF, GPTQ, QAT, EXL2 ?