RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

Laikh

7/31/2024

Unable to start pod with MI300x

Observing "hang" when starting pod with 8xMI300x, screenshot attached. Any ideas on how to fix this?

Sir Falk

7/31/2024

Exposing port not working

I'm trying to create embeddings using infinity. There is already a docker container for that: https://hub.docker.com/r/michaelf34/infinity Now I've tried to launch it and expose port 7797. However, I can't reach the container via the proxy:...

Kushagra

7/30/2024

Error after restarting the containers.

Command : docker compose up Error: WARN[2024-07-30T12:12:22.042930970Z] Controller.NewNetwork mia-runpod-backend_default: error="failed to create DOCKER-USER IPV6 chain: iptables [+] Running 3/4es --wait -t filter -N DOCKER-USER: ip6tables v1.8.4 (legacy): can't initialize ip6tables table `filter': Table does not exist (do...

utmostmick0

7/30/2024

ULTIMATE Stable Diffusion Kohya ComfyUI InvokeAI

doesn't start properly looks like its creating the stable diffusion container 4 time in a row

ChiKim

7/30/2024

Anyone Getting Bad Pods with Internet Issues?

I'm in US, and I get a lot more bad pods with internet issues than working pods like 7 out of 10. I'm trying to spot a community pod with rtx 4090 and the default template pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04. When I get a bad pod, I get error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) If the pod runs and if I connect via ssh and try to setup, I often run into problem with apt on ubuntu and python pip. Sometimes I get certificate error, extremely slow speed less than 10 bytes per second, etc. I have to keep launching different pod until I get a working one. Anyone has the same problem?...

amaimon

7/30/2024

Creating a pod by extending another pod

The existing Comfy pod is very basic, so each time I need to run my huge flows I would either have to reinstall all required custom nodes from scratch, or install them once and pay for disk storage. Would it be instead possible to create a new pod after I install all my required nodes, so I just deploy my pod with all required dependencies later?

mason

7/29/2024

Container Logs via the API or SDK

As far as I can see, there is no way to access container logs via the API, correct?

MokshMalik

7/29/2024

Training jobs using script

Hey, Can anyone tell me if runpod gives the feature to create a training script that can be run from anywhere and I can use that to create a GPU instance, and load and save my data to external cloud storages just like in AWS Sagemaker training script mode? I need to train multiple models in such manner with different architectures to see which one performs the best.

RollyPolly

7/28/2024

Possible to terminate pod from Within the pod?

I know you can terminate pod from “outside” with runpodctl- but are there any options for a pod to self-terminate, triggered by its own docker image? Or, am I approaching this wrong and ‘best practice’ is to have your pods giving status updates back to and being managed by a script on your main PC w/ runpodctl?...

Solution:

One of them is RUNPOD_POD_ID which you can use to remove/terminate/kill the pod.

NORDFY

7/28/2024

Install the dependencies issue

anyone can tell me why now i get this error ?

RollyPolly

7/28/2024

How is runpod secret / environment vars for credentials more secure?

I'm looking at the runpod Secret feature for handling AWS credentials. It looks like 'best practice' for handling credentials in a docker image is to set them as environment variables; and Runpod's "Secrets" feature feeds into that. Could anyone explain how using runpod's "Secrets" is more secure than just passing environment variables? If the security concern is to avoid writing your credentials directly into the image and instead pass them on launch with env vars, how do "Secrets" do anything more? Is it a feature for handling credentials within a runpod account managed by a team?...

Solution:

Yes, they are meant to keep keys secure in a team environment. With ENV variables all team members could view your keys in clear text in the template definition.

mason

7/27/2024

Get SSH Login Via API

When getting a pod via the API, it does not return any information on connecting via the Basic Terminal Access. Obviously the first bit of the username is pod ID but I haven't been able to identify the numbers proceeding after the dash. How might you get this username via the API or programmatically? ssh [email protected] -i ~/.ssh/id_ed25519 cbdf4581hxb1vy == pod ID...

AIguru

7/27/2024

Llama3 setup

Hi, everyone. We are planning to deploy Llama3 for our app with millions of users. How can we achieve this? And which GPU series or cloud platforms are best for achieving high speed and scalability?...

juvenilelocksmith

7/26/2024

BROKEN: TheLastBen Fast Stable Diffusion

2024-07-26T18:04:17.934600984Z --2024-07-26 18:04:17-- https://huggingface.co/datasets/TheLastBen/RNPD/raw/main/Notebooks.txt 2024-07-26T18:04:17.960726061Z Resolving huggingface.co (huggingface.co)... 65.9.95.31, 65.9.95.61, 65.9.95.114, ... 2024-07-26T18:04:17.964895834Z Connecting to huggingface.co (huggingface.co)|65.9.95.31|:443... connected. 2024-07-26T18:04:18.292202440Z HTTP request sent, awaiting response... 401 Unauthorized 2024-07-26T18:04:18.292233330Z ...

Solution:

Template has been pulled for a while already because RunPod cancelled the contract with TheLastBen so he removed the files from his repo and broke it.

NORDFY

7/26/2024

Network volume

Hi guys, I am new to Runpod. I am trying to set up a network volume, but I cannot see the "Connect to Jupyter Notebook" option after I deployed the GPU within the network volume. What did I miss?

NORDFY

7/26/2024

network volume

Hi guys, I am new to Runpod. I am trying to set up a network volume, but I cannot see the "Connect to Jupyter Notebook" option after I deployed the GPU within the network volume. What did I miss?

TcCat

7/25/2024

ollama won't pull manifest - weird error.

In a runpod I've tried the various ollama templates, and also installed ollama on a basic template. I can run ollama serve; but in every case when I run ollama run <model> I always get the error: Error: pull model manifest: Get "https://registry.ollama.ai/v2/library/mistral-large/manifests/latest": dial tcp: lookup registry.ollama.ai on 127.0.0.11:53: read udp 127.0.0.1:59647->127.0.0.11:53: i/o timeout ...

briefPeach

7/25/2024

is disk volume faster than network volume?

I found that network volume is very slow when loading models to gpu. I wonder if disk volume is faster? Is disk volume physically attached to pod? Also can I mount both disk volume and network volume to the same pod machine?...

atteeeef

7/25/2024

Cannot open Checkpoints folder - Fooocus

hi, I can't open the Chekcpoints folder in Jupyter Fooocus, when I click on it, it does nothing, nothing happens, but I can open the other folders, I don't understand, I tried to delete them the pod and redo it but it still doesn't work, please let me know.

Asad Cognify

7/25/2024

CA-MTL-1 region has GPU loaded at 87%

I created a pod in the CA-MTL-1 region with A40 GPU. The pod started with GPU 100% utilization and 87% GPU memory in use. I tried it multiple times, but same result.

Previous Next

Gaming

Programming

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!