RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

KoboldCpp - Official Template broken

I've tried to launch the KoboldCpp template a few times, but am hitting errors. The model I want to use downloads in two parts (split with commas in launch arguments). The downloads finish and append, but the logs show 'rm: cannot remove './mmproj.gguf': No such file or directory' right before it finishes. The container then restarts and the downloads begin again from square one. These same models worked the last week. I have saved the entire logs if needed.

Secret now showing up in the pod `env` output

hi, i added some secrets and added those secrets as environment variables for my pod, but i couldn't see it when i run env in my pod, i'm using {{ RUNPOD_SECRET_secret_name }} as the environment variable value...
No description

transfer data of a stopped pod to a new one

hey i finished my training on a big pod and i want to share all the data to another pod using the storage (network volume) how can i do that?

pod error

2024-08-19T23:00:50Z create pod network 2024-08-19T23:00:51Z create container runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 2024-08-19T23:00:52Z 2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 Pulling from runpod/pytorch 2024-08-19T23:00:52Z Digest: sha256:75bf115d87ee3813f8026fed3e11bae3bf68bfd789a9566878735245b723ef8b 2024-08-19T23:00:52Z Status: Image is up to date for runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04...

Pod Down for hrs

Any idea how long this will take to resolve, I cannot access my pod.
No description

Can pods shutdown from inside the pod itself?

Wondering if the pod can accept a shutdown command to stop billing

Does runpod provides environments isolation?

Hi, if we want to have two isolated environments, dev and prod, what can I do in Runpod? Thanks,...

error pulling image (US community server)

When creating a new community pod based in the US I get this message: error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers What is the problem here?...

Resuming an on demand pod via sdk

Hello, how can I resume a spot pod through the python sdk? I am using the resume_pod function but I am not able to

Possibility of Pausing a Pod Created with Network Storage

Hello, I am a new user of RunPod. Currently, I am using a pod created through network storage. I noticed that regular pods have a pause function, but I couldn't find this feature in the pod created with network storage. I would like to know if this feature is available for such pods and, if so, how I can use it.

Docker run in interactive mode

Hi, I want to be able to ssh into my pod and run bash commands. If i provide no entry command in my Dockerfile I am unable to connect to my pod via ssh. I also don't see anywhere with the option to edit the docker run command to include the interactive flag. Any help is appreciated...

Made a optimized SimplerTuner runpod : Failed to save template: Public templates cannot have Registr

Hi! I've spent a couple of days creating and testing a Docker flow for RunPod, and I've run the pod privately multiple times with no problems. There is no registry information in the Dockerfile, but I keep encountering this error, with absolutely no indication of its origin or how to fix it. Any help would be greatly appreciated, as we have a community eager to train Flux1 on RunPod....

URGENT! Network Connection issues

Hi, looks like there is a general issue in all pods and all of them are suffering network connection issues. Can someone look into this?

Looking for suggestion to achieve Faster SDXL Outputs

Hi, I am currently trying to generate large amount of images (200+) every session via Automatic1111/Forge UI SDXL Model and was wondering how can I generate them fast? I tried using RTX 3090 for generation and it's about 1.5-2 it/s which is pretty slow in the long run. Perhaps there is a faster alternative to this or workflow? Please let me know. Provide me a workflow and GPU suggestion that can generate large amount of images swiftly....

Official Template not running correct version of CUDA

Hello ! I'm trying to run a pod using the official templates : runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 Unless I completely misunderstood the notation, the image should run with cuda11.8.0 right? I've tried with Secure Cloud RTX 4090 and Secure Cloud RTX Ada 6000...
Solution:
@InnerSun nvidia-smi shows max CUDA version supported by host

I can't run the pod with container start command

bash -c "cd /workspace/ && sh run.sh" I tried with this start command, but it does not work, it seems that it run repeatedly but after I connect to pod and run "cd /workspace && sh run.sh", it work well ...

Volume / Storage issues

I am attempting to install comfy on a few different machines. Before Comfy and the flux dev models are done installing I am getting a out of volume error and cannot run the pod. Could this be just a bad string of luck in a few broken pods? Or am I not srtting something up correctly?...
No description

I'm trying to start a cpu pod using the graphql endpoint and specifying an image

Hey, I've succesfully ran a cpu pod creation using the graphql endpoint, however, it does not seem to follow the same structure as the gpu creation. What I'm trying which is working is: ``` mutation { deployCpuPod( input: { ...

Please help urgent

Hey I have been struggling for hours trying to set up this pod to train a LORA for Flux..... Can someone explain why or how the container is full? I know that I need to install some models, but shouldn't they go into the Volume - which I made to 200GB to avoid storage issues?!...
No description

Running custom docker images (used in Serverless) to use in Pods

Hey everyone - so here is my current situation: - I have created a custom docker image for my serverless endpoint in Runpoint - My local machine is a macbook so I am unable to execute the NVIDIA dependent comfyui installation I have in the image, so trying to see if I can run this on a Runpoint pod instead - The use case is that I'm trying different workflows in ComfyUI that I want to test out in a Pod, before I deploy to the Serverless endpoint...