RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

How do I upload 5 gb file and use it in my pod?

I tried to upload it several times but only 1.2 gb is uploaded (error). (I used Jupytor web based interface). I purchased a storage of 150GB, and made a new pod. But the issue is still the same. ...
Solution:
OhMyRunPod --setup-ssh

What are the certifications for each Data Center *URGET*

I want to see a list of the certifications for each of the Data centers available when creating a pod.

Are Pods good for batch inference of encoders?

Hello all, I want to deploy an encoder. Think of BERT model (like "bert-base-uncased" on huggingface) with some aggregation header, such as, predicting class probabilities. However, I do not want to use that model in real-time but for batch inference. Typical scenario: I need to predictor for 1 Mio records within 10 mins. I need my GPU nodes to scale up from 0 to n, process those mio records stored on cloud storage, create predictions on cloud storage, and scale down from n to 0....
Solution:
I'd suggest reading about skypilot, since runpod doesn't provide like a platform for batch inference

524: A timeout occurred (cloudflare)

Hello everyone. I have a GPU pod with an API that runs a ml model and the calculation takes quite a long time. It's a long running task and it takes about 2-3 minutes. I saw that runpod.io uses cloudflare (this is in the error response) and cloudflare describes that they have a timeout at 100 seconds. I've seen timeout settings for serverless but not for pods. Does anyone know how to solve this? ...
Solution:
Cloudflare is designed as a CDN to serve your traffic from edge locations all over the world, so its more for HTTP/HTTPS traffic that should respond pretty quickly, its not designed for long running requests. And if you are using FastAPI, it sounds like your application will probaby need to scale up at some point and pods do not autoscale, so I suggest using serverless instead then you won't have any issues.

Notification When Pod Is Deployed

Hello, I'm an admin of a research group which uses runpod. I would like to receive a notification anytime a pod is deployed, along with it's specs (most importantly the GPU type, num GPUs, storage info, community or secure cloud, the image which is being used, and the account which launched it). My hope is to then automate some admin things and run commands (such as automatically terminating the pod) after init. ...
Solution:
@muddyfootprints. we don't currently offer notifications for when a new pod is created, however I have taken note of this thread for feedback for the team which will be reviewed by leadership tomorrow.

Billing for separate users or pods

Hi is there any way to track usage (in terms of cost) for pods (for example based on ID) or better if there is a way to see how much each of my team user has cost?

Automatically Terminate Idle Pods

I want to write a daemon which will automatically terminate my pod if the GPU has sat idle for x amount of hours. Has anyone done something like this before and have code lying around for it? Or could at least point me to the appropriate APIs?...

Slow download

I'm currently getting 2mb/s download on 2xa100 pod; normally get way higher than this -- anyone else running into this rn?
Solution:
i shut down the pod and started another one and it was a lot faster

`ERROR | InternalServerError: None is not in list`

I was using several machines but faced the same error(above). Someone said it was due to the ddos attack. Is it right or not? DDos can attack the security pods easily? thanks...

Increase Spot Warning Time

I see in the docs that there is a 5s window before a spot instance is interrupted. 5s isn't really enough time to save or do anything - e.g. AWS has a 2 minute warning. Even if 2 minutes is too much, it would be huge if we could get 1m or even 30s of a warning, so that we don't need to check so often.

how to route docker secrets to pod automatically

I have some credentials saved as runpod secrets. After creating a new pod using runpodctl, I manually have to add the secrets to the pod. Is there a way to have the secrets available in the pod, without manually adding them?

Network issue ETA?

Several of my podst got hit with This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime. including e.g. 82mr3meakiiytt Do you have ETA for the fix? They are still not back up....

same GPU, different machine -> different speed

The image shows 2 yolo object detection runs with identical setup (same batch size, image size, number of epochs) on 2 different runpods. The GPU was in both cases the RTX 4090 slow machine +---------------------------------------------------------------------------------------+...
No description

Kohya port not working

i'm trying to lunch kohya tried everything nothing work even used the command tail -f /workspace/logs/kohya_ss.log...
Solution:
You only need to chat in 1 place, not multiple places, you have already been answered in #🎤|general

runpodctl -> get public IP + exposed ports

Lets say I create a new pod using runpodctl create pod --name 'Whatever' \ --imageName 'runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04' \ --gpuType 'NVIDIA GeForce RTX 3070' ...
Solution:

This pod suddenly came into my account ( i didnt create it )

vi9vaz7fu77b52 Thats the pod id, already deleted it. I think because of vllm workers / template?...

pod has no public ip

A pod has no public ip despite me clicking on the "public ip" checkmark

can I deploy flask, celery, redis, postgreSQL on runpod?

Hi, as you know the pod only persist data under /workspace folder. for all python related packages I can use venv to put all the data and configuration under /workspace. while if I need to install all the tools like flask, celery, redis, postgreSQL they are not python installation, the configuration files will be scattered here and there. all these file and configuration will disappear after pod restart. ...
Solution:
You can install whatever you want but I don't recommend installing databases etc on RunPod. Its better to deploy those things to a CPU cloud provider and use RunPod serverless for offloading tasks that need to run on a GPU.

CudaToolkit >= 12.2

When selecting the POD to deploy, I can filter the GPU supported cuda version up to v12.4. I suppose this refers to the CUDA display driver, right? The runpod base images however, only provide up to "cuda 12.1.1" which is not the driver- but the cuda toolkit version, correct?...
Solution:
You have two types of CUDA One that shows from nvidia-smi with is max cuda version supported by host. Version from nvcc --version is one bundled with template ...