RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Syncing taking too long?

Hi everyone. I'm using ULTIMATE Stable Diffusion Kohya ComfyUI InvokeAI pods. It works well yesterday, but when I tried to create it again today, it stuck on the sync of A1111 (image attached). I've wait a while for this to go through but no dice. I did this in the secure cloud. However when I tried using community cloud, the syncing went fine. Anyone knows what's happening?...
No description

How to store Model to Network Volume

I am saving my Huggingface model with save_pretrained. Which base path do I pass here so that model is saved to Network Volume instead of Container Disk...
Solution:
It is set in the Template. The default mounts to /workspace Often the best way to accomplish storing models there is to create a symbolic link into /workspace...

Account Drained overnight with nothing running

Spun up a serverless api. Did not use it at all. Got billed 60$ since last night. Could you check what caused this behavior. EnerpriseDna Team...
No description

Unable to start pod with llm-foundry image

I'm trying to launch a pod with llm-foundry https://github.com/mosaicml/llm-foundry/tree/main?tab=readme-ov-file#mosaicml-docker-images but the Pod stuck in initialization without error messages.

How to Run Roop unleashed on Runpod

Hello dears I want to run Roop unleashed on Runpod Can you explain the way please...

Pod unreachable

I cannot connect to pod due to timeout. I am using secure cloud. Doesn't seem very reliable. Somebody experiencing the same? ``` ValueError: Ollama call failed with status code 524. Details: <!DOCTYPE html> <!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->...

Is there a way to transfer disk volume between instances? not through 3rd party cloud.

as title, I got a bill for $300 on GCP for egress 3TB of data here, crying :(((((((

A1111 Stable Diffusion 1.10.0 - problems with Dynamic Promts

Hi, I have problems with Dynamic Prompts, the installation works via Automatic1111 but it does not appear in the GUI. same if I use git: cd /workspace/stable-diffusion-webui git clone https://github.com/adieyal/sd-dynamic-prompts/ extensions/sd-dynamic-prompts...
Solution:
Edit the file:
/workspace/stable-diffusion-webui/webui-user.sh
/workspace/stable-diffusion-webui/webui-user.sh
and remove --skip-install if you want to install extensions from the UI....

URGENT, NEED HELP!

Hello, I am wondering if all pods use AMD CPUs. I am on a 4 * A100-80G GPU instance in OR, and it uses AMD [EPYC 7763] CPU (it is extremely slow, 15 times slower than a normal intel Cascade lake CPU, I don't know if this is caused by container tech). Are there VMs that use intel CPUs and possibly different types of intel CPUs (like different Xeon Platinum in Cascade Lake) And how can I see CPU info before spinning up a VM? Thank you so much in advance!...

Critical error

Pod seems to have hardware issues. I cant connect to it to backup my data and im still paying full price for it...
No description

LoRAs aren't showing

So, I have LoRAs both private and from Civitai. I've clicked "Copy Link" and pasted them into the respective LoRA folder after the "wget" command. It claims they have been saved, and they show up on the left-hand side where the folders are, but it doesn't show up in the web-ui

ADetailer for Runpod Stable Diffusion isn't working.

I recently just downloaded ADetailer from huggingface to improve faces on my generations. I have all the settings right, the enable box ticked, but it doesn't work. It doesn't even zoom in and create a box around the face as if the ADetailer is in effect, it just does nothing.

How many ports can I expose?

Hi, what's the maximum number of tcp ports I can expose in one pod? Basically we are going to use a pod with 8 gpus, we want to expose lots of ports for different purposes. Thanks,...

ComfyUI : GPU and VRAM at 0%

Hi. I'm running an RTX4090 pod with the comfyui template by ai-dock to run the flux[dev] model . However, the pod shows 0% GPU usage and also 0% VRAM usage. In contrast, the RAM ahs about 30gb taken up. The model runs slower than I expected too(although I have no point of comparison). Is this likely to be a bug in runpod's resource monitoring or is there something wrong with my pod or pod template ?...

CUDA error: uncorrectable ECC error encountered

I just provisioned an 8xH100 NVL machine, made it load a very large model and then the container got stuck into a restart loop trying to load the model stuck on this error: 2024-08-04T16:43:13.809833249Z RuntimeError: CUDA error: uncorrectable ECC error encountered This looks like a hardware defect. Is there a way to get my credits back for that run?...

Save template overrides

Hi. I'm using the Comfyui - ai-dock template but it's getting tiresome to manually change the environment variables each and every time. Is there any way that runpod.io could remember them for me. Or can I save a copy maybe with those settings in My Templates ? Thanks for your help....
Solution:
yes sure copy them in my-template works, i think there is a suggestion to easily save the env from a pod but i don't know its progress

Base image + code

Hi. I'm new to runpod. I am building a sort of mega-template using a runpod base template (2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04) and adding my own code, about another 500Mb. where is a good place to host the image for runpod template to pull from? thanks for any tips / links.

Why can I only rent 7 h100 nvls since yesterday and not 8?

Why can I only rent 7 h100 nvls since yesterday and not 8?

Accerelate launch is getting stuck on pods

Accerelate alunch on the pod getting stuck on the command line

network storage sooo slow

Hi, I'm new to runpod. I'm running a 5xH100 in US-KS2 with network storage in the same region. Loading the model (70B Llama) from storage is going to take 25 minutes. Is this normal for runpod? This normally takes seconds on other machines I've used.