RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Cloud Files Updating Backblaze

After I upload my files to Backblaze and I decide later to add some more stuff to the workspace is there a way to update the Backblaze cloud with only the new files without deleting and reuploading them?
Solution:
Re backup them or upload them manually works

Pod GPU keeps disconnecting...

i create a pod and when i finish my work the next time i open it the gpu is not available and i have to reinstall from the beginning the whole Fooocus and loss all my downloaded checkpoints and stuff... is there a way to fix this by having my files stored somewhere safe and just connect them with the pod? and how should i do that? please be as specific as possible im beginner.

Container Files Missing in Workspace On Pod Launch

When launching pods (a40) on both community and server cloud, using a custom image that populates /workspace as a volume, the expected files and directories don't show up. This worked as of last Friday, and the image has not changes on its github container repo. There is more than enough space on both the network and disk volumes to contain these files

Start the pod with a custom command after the pod finishes startup

How can add I command that the pod will execute after it has finished starting up? I tried using
bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'
bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'
And replaced sleep infinity with apt install nano -y...
Solution:
You can try the other way around:
apt update && apt -y install nano && /start.sh
apt update && apt -y install nano && /start.sh
...

How do I upload 5 gb file and use it in my pod?

I tried to upload it several times but only 1.2 gb is uploaded (error). (I used Jupytor web based interface). I purchased a storage of 150GB, and made a new pod. But the issue is still the same. ...
Solution:
OhMyRunPod --setup-ssh

What are the certifications for each Data Center *URGET*

I want to see a list of the certifications for each of the Data centers available when creating a pod.

Are Pods good for batch inference of encoders?

Hello all, I want to deploy an encoder. Think of BERT model (like "bert-base-uncased" on huggingface) with some aggregation header, such as, predicting class probabilities. However, I do not want to use that model in real-time but for batch inference. Typical scenario: I need to predictor for 1 Mio records within 10 mins. I need my GPU nodes to scale up from 0 to n, process those mio records stored on cloud storage, create predictions on cloud storage, and scale down from n to 0....
Solution:
I'd suggest reading about skypilot, since runpod doesn't provide like a platform for batch inference

524: A timeout occurred (cloudflare)

Hello everyone. I have a GPU pod with an API that runs a ml model and the calculation takes quite a long time. It's a long running task and it takes about 2-3 minutes. I saw that runpod.io uses cloudflare (this is in the error response) and cloudflare describes that they have a timeout at 100 seconds. I've seen timeout settings for serverless but not for pods. Does anyone know how to solve this? ...
Solution:
Cloudflare is designed as a CDN to serve your traffic from edge locations all over the world, so its more for HTTP/HTTPS traffic that should respond pretty quickly, its not designed for long running requests. And if you are using FastAPI, it sounds like your application will probaby need to scale up at some point and pods do not autoscale, so I suggest using serverless instead then you won't have any issues.

Notification When Pod Is Deployed

Hello, I'm an admin of a research group which uses runpod. I would like to receive a notification anytime a pod is deployed, along with it's specs (most importantly the GPU type, num GPUs, storage info, community or secure cloud, the image which is being used, and the account which launched it). My hope is to then automate some admin things and run commands (such as automatically terminating the pod) after init. ...
Solution:
@muddyfootprints. we don't currently offer notifications for when a new pod is created, however I have taken note of this thread for feedback for the team which will be reviewed by leadership tomorrow.

Billing for separate users or pods

Hi is there any way to track usage (in terms of cost) for pods (for example based on ID) or better if there is a way to see how much each of my team user has cost?

Automatically Terminate Idle Pods

I want to write a daemon which will automatically terminate my pod if the GPU has sat idle for x amount of hours. Has anyone done something like this before and have code lying around for it? Or could at least point me to the appropriate APIs?...

Slow download

I'm currently getting 2mb/s download on 2xa100 pod; normally get way higher than this -- anyone else running into this rn?
Solution:
i shut down the pod and started another one and it was a lot faster

`ERROR | InternalServerError: None is not in list`

I was using several machines but faced the same error(above). Someone said it was due to the ddos attack. Is it right or not? DDos can attack the security pods easily? thanks...

Increase Spot Warning Time

I see in the docs that there is a 5s window before a spot instance is interrupted. 5s isn't really enough time to save or do anything - e.g. AWS has a 2 minute warning. Even if 2 minutes is too much, it would be huge if we could get 1m or even 30s of a warning, so that we don't need to check so often.

how to route docker secrets to pod automatically

I have some credentials saved as runpod secrets. After creating a new pod using runpodctl, I manually have to add the secrets to the pod. Is there a way to have the secrets available in the pod, without manually adding them?

Network issue ETA?

Several of my podst got hit with This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime. including e.g. 82mr3meakiiytt Do you have ETA for the fix? They are still not back up....

same GPU, different machine -> different speed

The image shows 2 yolo object detection runs with identical setup (same batch size, image size, number of epochs) on 2 different runpods. The GPU was in both cases the RTX 4090 slow machine +---------------------------------------------------------------------------------------+...
No description

Kohya port not working

i'm trying to lunch kohya tried everything nothing work even used the command tail -f /workspace/logs/kohya_ss.log...
Solution:
You only need to chat in 1 place, not multiple places, you have already been answered in #🎤|general

runpodctl -> get public IP + exposed ports

Lets say I create a new pod using runpodctl create pod --name 'Whatever' \ --imageName 'runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04' \ --gpuType 'NVIDIA GeForce RTX 3070' ...
Solution: