RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Network issue with runpod

Hey folks my pod id (sb3ogh2mqvkuy6) has become unavailable. The error message I'm getting: "This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime." Is there any guidance on how long it would take to restore this pod?...

How to deploy Llama3 on Aphrodite Engine (RunPod)

I have setup the following settings for a pod with 48 GB RAM. 1) I'm not sure how to enable Q4 cache otherwise the 5.0bpw won't fit. Any advice please? (See attached) 2) I get an error config.json can't be found, It seems like the REVISION variable has not been taken into account. Based on the docs it says: REVISION: The HuggingFace branch name, it defaults to the main branch....
Solution:
Sure, I just made A PR. Please have a look: https://github.com/PygmalionAI/aphrodite-engine/pull/455 Do you think you could cherry pick this fix for RunPod?...
No description

Max number of Pods

How many Pods can I run concurrently in Secure Cloud, for a High Availability machine, 10, 100, 1000 ?
Solution:
You can run at least 1,000 pods if you're an "enterprise" client, what's your usecase for this?

wget command error. 401 error. Trying download the model

HTTP request sent, awaiting response... 401 Unauthorized when using wget
Solution:
wget https://civitai.com/api/download/models/479474?token=PUTYOURTOKENHERE
wget https://civitai.com/api/download/models/479474?token=PUTYOURTOKENHERE
No description

Can i set a static port for comfyui pod?

i have setup the workflow on comfyui deploy and if the port change everytimes i have to update comfyui deploy everytimes
Solution:
Use HTTP port with the proxy URL instead of TCP port if you need it to be static.

Can runpod fetch docker images from custom registries (i.e. not dockerhub)?

I'd like to avoid using docker hub. In order of preference, I'd like to: - download from s3-compatible object storage, or, - connect to my own server's docker registry are either of these possible?...
Solution:
Yes. Use image name. urlofregistry/username/imagename:tag

Ramdisk

Is there any way to get a Ramdisk in a runpod.io - Pod? I need it to load switching models faster... I cannot mount ramfs since priviledged mode is not possible...
Solution:
not possible to get provilaged container

increase spend limit

hello who do i talk to to raise my daily spend limit
Solution:
submit ticket on website

Can’t start web app on 80 port cpu based pod

Hi everyone! I have a web app. I used nginx for configuration proxy. I started an app via http-server with port 4000, which is also opened via tcp....
Solution:
you cant pick what port you want to have external if you exposed port XX TCP it will get random assigned port SO PORT yyyy WILL POINT TO INTERNAL PORT xx...

Pod with extremely slow upload

``` Server: NORDUnet A/S - Stockholm (id: 14200) ISP: Obehosting AB Idle Latency: 1.24 ms (jitter: 0.12ms, low: 1.04ms, high: 1.30ms) Download: 809.23 Mbps (data used: 435.7 MB) ...

How to tell how much storage being used in pod? (including network drive)

I try df -h, but it seems to represent the whole filesystem. ```(base) root@f3165c77df52:/workspace# df -h Filesystem Size Used Avail Use% Mounted on overlay 30G 8.9G 22G 30% / tmpfs 64M 0 64M 0% /dev...

Can't see training progress after reset

hello, i've started a new training on a notebook and then my computer restarted. after restarting, i sign in my runpod account and opened the traning instance. then i can't see any progress anymore. it shows gpu memory using, but how can see training progress?
Solution:
Jupiter notebooks do not save output on tab browser closing. Though the job continues to run one app finish to run it should update cell
No description

Maintenance - only a Community Cloud issue?

Hey there! I just started a new pod and noticed this maintenance window. Is this only a thing on community cloud or also on secure cloud?...
No description

SDK GPU naming specification

When I am setting up a pod using the sdk how specific does the GPU name have to be? Is there a list of proper naming?

How to get a general idea for max volume size on secure cloud?

I have been able to deploy 2TB drives, but what is the standard here? How much storage is there generally per server to estimate what i should expect to be able to get?

Template pytorch-1.13.1 lists cuda 11.7.1 version but is actually cuda 11.8?

I tried running a model that requires pytorch-1.13.1 and 11.7 but it said the cuda version doesn't match (the pod is actually on 11.8). The mismatch check happens in the deepspeed package. I tried starting up a new pod with the same template and did nvcc --version and it said the pod was on cuda version 11.8. Is this normal or an error? I can't seem to run my model because of the cuda version mismatch. For reference, I'm using A40....

Can't connect to sfpt

Hi, I can't access sftp. On my previous pod I could do it and I just swaped the ip and the port, but now it doesn't work. Is there a problem on runpod's side?

Unable to ssh onto my pod with the public key already on the runpod server

I am unable to ssh into the pod when using the command from runpod's site: ``` name .ssh % ssh [email protected] -p 22138 -i ~/.ssh/id_ed25519 ssh: connect to host 194.68.245.27 port 22138: Operation timed out...

Python modules missing when pod is starting

When starting a comfy-ui pod after some downtime, I get a lot of messages of the kind ``` Import times for custom nodes: 0.0 seconds: /workspace/ComfyUI/custom_nodes/websocket_image_save.py...

Unable to connect to Pod

Since last Friday I have been unable to connect to my pod. It worked fine Thursday and now whenever I send the following command, it returns {"detail":"Not Found"}: curl https://6ppno5hzfbrl76-8000.proxy.runpod.net/v1/model Am I missing something? I even get this error when launching web terminal - is my model not loaded? I used a pre-built template that should download the model from HuggingFace...
No description