RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Llama3 setup

Hi, everyone. We are planning to deploy Llama3 for our app with millions of users. How can we achieve this? And which GPU series or cloud platforms are best for achieving high speed and scalability?...

BROKEN: TheLastBen Fast Stable Diffusion

2024-07-26T18:04:17.934600984Z --2024-07-26 18:04:17-- https://huggingface.co/datasets/TheLastBen/RNPD/raw/main/Notebooks.txt 2024-07-26T18:04:17.960726061Z Resolving huggingface.co (huggingface.co)... 65.9.95.31, 65.9.95.61, 65.9.95.114, ... 2024-07-26T18:04:17.964895834Z Connecting to huggingface.co (huggingface.co)|65.9.95.31|:443... connected. 2024-07-26T18:04:18.292202440Z HTTP request sent, awaiting response... 401 Unauthorized 2024-07-26T18:04:18.292233330Z ...
Solution:
Template has been pulled for a while already because RunPod cancelled the contract with TheLastBen so he removed the files from his repo and broke it.

Network volume

Hi guys, I am new to Runpod. I am trying to set up a network volume, but I cannot see the "Connect to Jupyter Notebook" option after I deployed the GPU within the network volume. What did I miss?

network volume

Hi guys, I am new to Runpod. I am trying to set up a network volume, but I cannot see the "Connect to Jupyter Notebook" option after I deployed the GPU within the network volume. What did I miss?

ollama won't pull manifest - weird error.

In a runpod I've tried the various ollama templates, and also installed ollama on a basic template. I can run ollama serve; but in every case when I run ollama run <model> I always get the error: Error: pull model manifest: Get "https://registry.ollama.ai/v2/library/mistral-large/manifests/latest": dial tcp: lookup registry.ollama.ai on 127.0.0.11:53: read udp 127.0.0.1:59647->127.0.0.11:53: i/o timeout ...

is disk volume faster than network volume?

I found that network volume is very slow when loading models to gpu. I wonder if disk volume is faster? Is disk volume physically attached to pod? Also can I mount both disk volume and network volume to the same pod machine?...

Cannot open Checkpoints folder - Fooocus

hi, I can't open the Chekcpoints folder in Jupyter Fooocus, when I click on it, it does nothing, nothing happens, but I can open the other folders, I don't understand, I tried to delete them the pod and redo it but it still doesn't work, please let me know.

CA-MTL-1 region has GPU loaded at 87%

I created a pod in the CA-MTL-1 region with A40 GPU. The pod started with GPU 100% utilization and 87% GPU memory in use. I tried it multiple times, but same result.

3 pods inaccessible after network outtage

There was a network outage in EU NO and the pods are up, but cannot start:
error creating container: container: create: container create: Error response from daemon: layer does not exist
error creating container: container: create: container create: Error response from daemon: layer does not exist
This is a second time an incident like this has occurred. I have >2 TB of storage I cannot access. Am I being billed for these pods? No response from support....

Build docker image

I have a pod, I would like to use it to build a docker image, specifically threestudio https://github.com/threestudio-project/threestudio/blob/main/docs/installation.md. But I've heard that running docker in a pod is not supported. How should I build the docker image?

How to set environment variable when launching pod with network volume

I am launching a pod with ashleykza's automatic1111 template using a network volume, however it starts to redownload everything even though it's already on my network volume. She provided an environment variable to skip 'sync'ing, which I thought I did when editing the template overrides as shown in the second pic. Despite this, its still redownloading everything. What am I supposed to set 'key' to here to prevent it from redownloading everything?
No description

'Background' options for Pod Initiated file transfer

I'm trying to scope out if there's a solution to have a runpod send me back a small .db/txt file on completion of task, or of progress before closing due to being outbid and closed (Community pods) I've been looking at rsync, runpodctl, SSH, and they all seem to require transfer to be 'initiated' from the recipient machine I'm looking at the google drive API, which I think is going to be my best bet for an 'always ready to receive' solution. ...
Solution:
You might need something like this, detect the signal and do something: import signal import boto3 import os...

No such image

I just created an image, pushed it to docker.io and created a Pod template referencing this image. However, startup fails due to Error response from daemon: No such image: $IMAGENAME I can pull the image locally from my machine without being logged in to docker.io. Why is my Pod not able to pull the image?...
Solution:
Yep, solved. Building the image with docker buildx build --platform linux/amd64 helped. Not a Runpod issue at all.

network volume usage on pod deploy

I created a community pod with 40 GB volume storage. By default it started with 59% usage. I tried deploying another pod and the same thing happened. This is in the US region.
Solution:
if its really empty, but it says used you can report it from the website's contact button @mathew
No description

Is it possible to use Runpod to finetune a text to speech model

I am not super tech savvy so I am unsure if this is possible, The TTS is (https://github.com/erew123/alltalk_tts) I know how to connect to runpod via SSH but I dont know how to connecting the two would work if its possible at all.

Predict SSH over TCP command predicting <username> - trying to automate pulling a repo at pod deploy

I want to pull a git repo into the workspace of a pod as it is deployed, i am trying to ssh into a pod without accessing the gui, i know the command has a typical form ssh <username>@<runpodproxy> -i (path to ssh). I do not know how <username> is generated. I can tell that the <username> is <[podID]-[string]>. Anyone know what the [string] is? is it predictable or otherwise associated with the pod? I am also looking into the runpodctl exec python [file] [pod id] command, any suggestions would be appreciated....

text gen webui template not downloading models

wehn I try downlaoding a model on text gen web ui nothing happens

Error response from daemon: driver failed external connectivity on endpoint.

Suddenly I am getting below error when I try to docker compose up The Docker was working fine on the pod. I just made some code changes and rebuilt it and now I getting below errors: Gracefully stopping... (press Ctrl+C again to force) Error response from daemon: driver failed programming external connectivity on endpoint mia-runpod-backend-engine-1 (f4a69cb1cbf0100d22af23c3d5dc5a09aeeac3425476d4bc8bfbf886e42a77f1): Unable to enable MASQUERADE rule: (iptables failed: iptables --wait -t nat -A POSTROUTING -p tcp -s 172.19.0.4 -d 172.19.0.4 --dport 8000 -j MASQUERADE: /usr/sbin/iptables: error while loading shared libraries: libip4tc.so.2: cannot close file descriptor: Error 24 (exit status 127))...

Updated Torch templates

Hi RunPod team. I write again because ever the templates on Runpod are out of date. We are lacking a torch 2.3 template for ROCm and CUDA. Tomorrow, torch 2.4 is released as well.

Persistent home directory?

Hi, I wonder if there is a way to persist the home directory. It's really inconvenient to lose all configurations after every reboot...