Need help with setting up Tensorboard for RVC!
Hello all, I need some help with Runpod. I am trying to get tensorboard to work when using either a secure or community GPU pod. I have no idea how to get tensorboard working. Do I need an SSH Server? I was trying to follow this guide - https://blog.runpod.io/how-to-achieve-true-ssh-on-runpod/ but I have no idea where I get a public key. If anyone here is experienced with tensorboard and knows how to make it work on Runpod, then I'd be grateful.
When I do get it install and running on a GPU pod, it gives me a local host link (which is something I cannot use on a remote server)
I will be playing around with it for just a few more minutes, before I just give up....
Storage pricing question
Can someone explain storage pricing for stopped gpu pods please? When i stop a pod it says i'm going to be charged an hourly rate (something like .028?) for the storage unless I terminate the pod. The pricing page says it's .1/gb per month though. Which is it? The hourly rate seems prohibitively expensive, and i can't imagine that I'm supposed to recreate my pod every time i want to test something and then delete everything.
Solution:
Terminate the pod completely, instead use a network storage in a popular region.
Network storage are essentially like "external harddrives" you can imagine, which any pod can persist data too. Usually under /workspace.
For ex. if you launch a network storage under region A, you can launch two pods on that network storage, and they are sharing that drive. ...
Creating own template
Hi, is it possible to create own ComfyUI template with a certain set of models, custom nodes etc. and then simply launch it on new GPU Pod so I don't have to manualy find and download every resource when setting up new pod?
Error when installing requirements of git:
OSError: [Errno 23] Too many open files in system
How can i fix??...Solution:
with ulimit -n # ?
Container keeps restarting
Hello, after I start GPU pod, the container keeps restarting:
2024-02-16T20:18:35Z start container
in infinite loop. when I SSH I get:...
Unable to upload models to Stable Diffusion.
Hi Team,
These days, I am unable to upload models to Stable Diffusion using CIVITAI or Google Drive.
Curl -O -J -L doesn't work.
Wget doesn't work too.
Gdown doesn't work either....
How should I store/load my data for network storage?
Hi,
I've been keeping my data in an sql database which is excruciatingly slow on runpod with a network storage.
But I don't see any obvious alternative.. ...
But I don't see any obvious alternative.. ...
worker-vllm list of strings
Hey,
I have fine-tuned model that i want to deploy in serverless. I tried the vLLM prompts approach with list of strings (as attached) on T4 Colab and it works really well - response in 0.5 secs. And here is my question - do i need to create my own worker to post input as a list of strings or you handle this in your vllm-worker? -> https://github.com/runpod-workers/worker-vllm
Thanks for you reply 😉
...
How to enable Systemd or use VPN to connect the IP of the Run Pod?
Hi, Greetings
I am facing a problem where I need to connect with the ip of the runpod, for that I tried to use vpn method, but it gives error that it is not using systemd.
Is there some other way to achieve this or how can I enable systemd?...
best practice to terminate pods on job completion
I have a one time job I want to run as a GPU pod. Currently the container gets restarted as soon as the job finishes. What's the best way to terminate the pod after completion?
Deploying H2O LLM Studio /w auth using Ngrok
I have been working most of the day to get this container deployed to runpod. Here's the trick though. I included nginx in the mix and am using it as a proxy_pass. This way I can use some sort of auth.
Here is the nginx config.
events {
worker_connections 1024;
}...
Wrong GPUs being assigned
I'm paying for the "7 x H100 80GB SXM5", but when I run nvidia-smi I get that there's actually 7 NVIDIA RTX A4000, a much inferior graphics card. What gives?
Network Volume suddenly empty in EU-RO-1
After a restart of a CPU pod that was attached to a network volume located in EU-RO-1 region, the /workspace directory was suddenly empty. No changes were done to the mount path prior to it becoming empty.
Has anyone ever encountered this issue? Is there any way of getting back the data that was on that volume?
Thank you!...
Reserving pods on different machines
Hey there, 4 of my long running pods have a scheduled maintenance at the same time. I would like to spin up new pods before then to cover for that, but how can I make sure the new pods won't be on the same machine and also undergo maintenance before starting them?
Ollama API
Hello, I am trying to host LLMs on runpod gpu-cloud using Ollama (https://ollama.com/download). I want to set it up as an endpoint so I can access it from my local laptop, using Python libraries like Langchain. I'm having trouble setting up the API endpoint, anyone worked with this before?
Is one physical CPU core assigned to vCPU?
I couldn't find this in FAQ.
Solution:
vcpu is 1/2 of cpu normally, most cpus have 2 threads per core
Slow upload speeds with runpodctl?
Hey there. Since yesterday I had to upload files to my instance twice and it's super slow (at the moment it's 650 kB/s which is not feasible really). It usually starts of way faster and slows down after a while. It doesn't seem to depend on my network because upload to other servers is significantly faster and it doesn't really seem to matter from which network (work, home) I upload, the speeds are way lower than e.g. a month ago. What's going on with the uploads and anything I can do about it?
Solution:
You can try using croc, then you don't need to use the RunPod relays.
https://github.com/schollz/croc...