RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

NFS mount is not allowed in pod?

Hello, I'm trying to mount my NAS server with NFS mount. but when I tried to mount it, I got mount.nfs Operation not permitted error. Is there no way to mount my server by nfs or sshfs?...
Solution:
wont work as it would require fuse and fuse requires provilaged containers

Skypilot & expose-ports

Hi, I'm using Skypilot to create and deploy Vllm on POD. If I'm correct, currently, the template runpod/base:0.0.2 is used when a POD is created through Skypilot. Ports 8266,6380 are exposed by this template for Ray (I guess)....

Issue with deploying gpu pod in CA-MTL-3 Region

In region : CA-MTL-3, when I try to depoy big server with more resource and container disk storage 4tb,it's throwing warning that there is no available instance with this storage.is there any way to increase the quota of storage for our account. Note:I am not talking about network drive,I am talking about container disk volume and persistent storage 2) and there is no network storage available for the above region,is thwere any way to make it available also? For reference I have attached screenshot also...
Solution:
no its not your quota, it was the availibility of hosts
No description

Network issue with runpod

Hey folks my pod id (sb3ogh2mqvkuy6) has become unavailable. The error message I'm getting: "This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime." Is there any guidance on how long it would take to restore this pod?...

How to deploy Llama3 on Aphrodite Engine (RunPod)

I have setup the following settings for a pod with 48 GB RAM. 1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached) 2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says: REVISION: The HuggingFace branch name, it defaults to the main branch....
Solution:
Sure, I just made A PR. Please have a look: https://github.com/PygmalionAI/aphrodite-engine/pull/455 Do you think you could cherry pick this fix for RunPod?...
No description

Max number of Pods

How many Pods can I run concurrently in Secure Cloud, for a High Availability machine, 10, 100, 1000 ?
Solution:
You can run at least 1,000 pods if you're an "enterprise" client, what's your usecase for this?

wget command error. 401 error. Trying download the model

HTTP request sent, awaiting response... 401 Unauthorized when using wget
Solution:
wget https://civitai.com/api/download/models/479474?token=PUTYOURTOKENHERE
wget https://civitai.com/api/download/models/479474?token=PUTYOURTOKENHERE
No description

Can i set a static port for comfyui pod?

i have setup the workflow on comfyui deploy and if the port change everytimes i have to update comfyui deploy everytimes
Solution:
Use HTTP port with the proxy URL instead of TCP port if you need it to be static.

Can runpod fetch docker images from custom registries (i.e. not dockerhub)?

I'd like to avoid using docker hub. In order of preference, I'd like to: - download from s3-compatible object storage, or, - connect to my own server's docker registry are either of these possible?...
Solution:
Yes. Use image name. urlofregistry/username/imagename:tag

Ramdisk

Is there any way to get a Ramdisk in a runpod.io - Pod? I need it to load switching models faster... I cannot mount ramfs since priviledged mode is not possible...
Solution:
not possible to get provilaged container

increase spend limit

hello who do i talk to to raise my daily spend limit
Solution:
submit ticket on website

Can’t start web app on 80 port cpu based pod

Hi everyone! I have a web app. I used nginx for configuration proxy. I started an app via http-server with port 4000, which is also opened via tcp....
Solution:
you cant pick what port you want to have external if you exposed port XX TCP it will get random assigned port SO PORT yyyy WILL POINT TO INTERNAL PORT xx...

Pod with extremely slow upload

``` Server: NORDUnet A/S - Stockholm (id: 14200) ISP: Obehosting AB Idle Latency: 1.24 ms (jitter: 0.12ms, low: 1.04ms, high: 1.30ms) Download: 809.23 Mbps (data used: 435.7 MB) ...

How to tell how much storage being used in pod? (including network drive)

I try df -h, but it seems to represent the whole filesystem. ```(base) root@f3165c77df52:/workspace# df -h Filesystem Size Used Avail Use% Mounted on overlay 30G 8.9G 22G 30% / tmpfs 64M 0 64M 0% /dev...

Can't see training progress after reset

hello, i've started a new training on a notebook and then my computer restarted. after restarting, i sign in my runpod account and opened the traning instance. then i can't see any progress anymore. it shows gpu memory using, but how can see training progress?
Solution:
Jupiter notebooks do not save output on tab browser closing. Though the job continues to run one app finish to run it should update cell
No description

Maintenance - only a Community Cloud issue?

Hey there! I just started a new pod and noticed this maintenance window. Is this only a thing on community cloud or also on secure cloud?...
No description

SDK GPU naming specification

When I am setting up a pod using the sdk how specific does the GPU name have to be? Is there a list of proper naming?

How to get a general idea for max volume size on secure cloud?

I have been able to deploy 2TB drives, but what is the standard here? How much storage is there generally per server to estimate what i should expect to be able to get?

Template pytorch-1.13.1 lists cuda 11.7.1 version but is actually cuda 11.8?

I tried running a model that requires pytorch-1.13.1 and 11.7 but it said the cuda version doesn't match (the pod is actually on 11.8). The mismatch check happens in the deepspeed package. I tried starting up a new pod with the same template and did nvcc --version and it said the pod was on cuda version 11.8. Is this normal or an error? I can't seem to run my model because of the cuda version mismatch. For reference, I'm using A40....