RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Unable to use model in stable diffusion

I tried to use a model I downloaded and received this error:
No description

Need help with setting up Tensorboard for RVC!

Hello all, I need some help with Runpod. I am trying to get tensorboard to work when using either a secure or community GPU pod. I have no idea how to get tensorboard working. Do I need an SSH Server? I was trying to follow this guide - https://blog.runpod.io/how-to-achieve-true-ssh-on-runpod/ but I have no idea where I get a public key. If anyone here is experienced with tensorboard and knows how to make it work on Runpod, then I'd be grateful. When I do get it install and running on a GPU pod, it gives me a local host link (which is something I cannot use on a remote server) I will be playing around with it for just a few more minutes, before I just give up....

Storage pricing question

Can someone explain storage pricing for stopped gpu pods please? When i stop a pod it says i'm going to be charged an hourly rate (something like .028?) for the storage unless I terminate the pod. The pricing page says it's .1/gb per month though. Which is it? The hourly rate seems prohibitively expensive, and i can't imagine that I'm supposed to recreate my pod every time i want to test something and then delete everything.
Solution:
Terminate the pod completely, instead use a network storage in a popular region. Network storage are essentially like "external harddrives" you can imagine, which any pod can persist data too. Usually under /workspace. For ex. if you launch a network storage under region A, you can launch two pods on that network storage, and they are sharing that drive. ...

Creating own template

Hi, is it possible to create own ComfyUI template with a certain set of models, custom nodes etc. and then simply launch it on new GPU Pod so I don't have to manualy find and download every resource when setting up new pod?

Error when installing requirements of git:

OSError: [Errno 23] Too many open files in system How can i fix??...
Solution:
with ulimit -n # ?

Container keeps restarting

Hello, after I start GPU pod, the container keeps restarting: 2024-02-16T20:18:35Z start container in infinite loop. when I SSH I get:...

Unable to upload models to Stable Diffusion.

Hi Team, These days, I am unable to upload models to Stable Diffusion using CIVITAI or Google Drive. Curl -O -J -L doesn't work. Wget doesn't work too. Gdown doesn't work either....

How should I store/load my data for network storage?

Hi, I've been keeping my data in an sql database which is excruciatingly slow on runpod with a network storage.
But I don't see any obvious alternative.. ...

worker-vllm list of strings

Hey, I have fine-tuned model that i want to deploy in serverless. I tried the vLLM prompts approach with list of strings (as attached) on T4 Colab and it works really well - response in 0.5 secs. And here is my question - do i need to create my own worker to post input as a list of strings or you handle this in your vllm-worker? -> https://github.com/runpod-workers/worker-vllm Thanks for you reply 😉 ...
No description

How to enable Systemd or use VPN to connect the IP of the Run Pod?

Hi, Greetings I am facing a problem where I need to connect with the ip of the runpod, for that I tried to use vpn method, but it gives error that it is not using systemd. Is there some other way to achieve this or how can I enable systemd?...

best practice to terminate pods on job completion

I have a one time job I want to run as a GPU pod. Currently the container gets restarted as soon as the job finishes. What's the best way to terminate the pod after completion?

Can I turn off few vCPU?

I'm doing an overload test, but I don't know how to limit vCPU in Python.

Deploying H2O LLM Studio /w auth using Ngrok

I have been working most of the day to get this container deployed to runpod. Here's the trick though. I included nginx in the mix and am using it as a proxy_pass. This way I can use some sort of auth. Here is the nginx config. events { worker_connections 1024; }...
No description

Wrong GPUs being assigned

I'm paying for the "7 x H100 80GB SXM5", but when I run nvidia-smi I get that there's actually 7 NVIDIA RTX A4000, a much inferior graphics card. What gives?
No description

Network Volume suddenly empty in EU-RO-1

After a restart of a CPU pod that was attached to a network volume located in EU-RO-1 region, the /workspace directory was suddenly empty. No changes were done to the mount path prior to it becoming empty. Has anyone ever encountered this issue? Is there any way of getting back the data that was on that volume? Thank you!...

Reserving pods on different machines

Hey there, 4 of my long running pods have a scheduled maintenance at the same time. I would like to spin up new pods before then to cover for that, but how can I make sure the new pods won't be on the same machine and also undergo maintenance before starting them?

Ollama API

Hello, I am trying to host LLMs on runpod gpu-cloud using Ollama (https://ollama.com/download). I want to set it up as an endpoint so I can access it from my local laptop, using Python libraries like Langchain. I'm having trouble setting up the API endpoint, anyone worked with this before?

Is one physical CPU core assigned to vCPU?

I couldn't find this in FAQ.
Solution:
vcpu is 1/2 of cpu normally, most cpus have 2 threads per core

We have detected critical error on this machine....!

whats causing this and how can this be avoided?
No description

Slow upload speeds with runpodctl?

Hey there. Since yesterday I had to upload files to my instance twice and it's super slow (at the moment it's 650 kB/s which is not feasible really). It usually starts of way faster and slows down after a while. It doesn't seem to depend on my network because upload to other servers is significantly faster and it doesn't really seem to matter from which network (work, home) I upload, the speeds are way lower than e.g. a month ago. What's going on with the uploads and anything I can do about it?
Solution:
You can try using croc, then you don't need to use the RunPod relays. https://github.com/schollz/croc...