RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

How to handle these WebSocket connections?

I set up an HTTP service on my Runpod, but there are continuous WebSocket connection requests coming in. I want to understand why these requests are occurring and how to stop them. My service is being significantly disrupted by these requests.They seem to share the same internal IP and client ID: INFO: ('***', 55730) - "WebSocket /ws?clientId=6d901a86659e4e78bfcc32b69bd5f68f" 403 INFO: connection rejected (403 Forbidden) INFO: connection closed INFO: ('100.64.0.32', 39206) - "WebSocket /ws?clientId=6d901a86659e4e78bfcc32b69bd5f68f" 403 INFO: connection rejected (403 Forbidden)...

How to set ContainerRegistryAuth for `podRentInterruptable`

When renting interruptable pods using GraphQL, like so: ```graphql mutation { podRentInterruptable( input: { ...

Fetch available spot pricing for gpu types

I can query the available GPU types using ```graphql query GpuTypes { gpuTypes { id...
Solution:
ah, ok this seems to work: ```graphql query GpuTypes { gpuTypes {...

Hardware optimization for computer vision

Hello, I’m currently developing a computer vision model. Due to the need for computational resources, I’m here. Honestly, I don’t have much knowledge about hardware, and I’d like to know if you have any documentation or material that could help me make the most out of the tool

ssh not connecting

None of Basic SSH Terminal, SSH over exposed TCP working. Created ssh-keygen and updated public key to setting. Confirmed ~/.ssh/id_ed25519 has private key.
Why would ssh root@ is asking me root password?...

FaceFusion is not working.

Hi, FaceFusion template is not working. Unable to launch the port 3000. No matter whatever the machine or region it. Surprisingly it happens only with Facefusion and not on any other templates like SD or Oobabooga...

Pod disapeared completely

Yesterday the GPU was gone, but the pod was still there. This morning the whole pod has gone. I'd left it running and it ran out of money, so was expecting the CPU to be gone again. But this time the interface acts as if it never was there. I paid for network storage. The storage is shown but not with any way to retireve what I had. Is this normal? ...

Followed the Hunyuan Video tutorial to the letter, and it didn't work.

I'm trying to get the hunyuan video stuff to work, but ComfyUI keeps saying the custom nodes aren't loaded. when I look at the logs, I see this: 2025-01-04T21:58:16.083013463Z ### Loading: ComfyUI-Manager (V2.51) 2025-01-04T21:58:16.233208406Z ### ComfyUI Revision: 3009 [d45ebb63] | Released on '2025-01-04' 2025-01-04T21:58:16.244099498Z Import times for custom nodes: 2025-01-04T21:58:16.244129725Z 0.0 seconds: /workspace/ComfyUI/custom_nodes/websocket_image_save.py...

Lost my GPU and forced to pay more?

Hoping someone can help asap. Started a pod yesterday, a cheaper GPU one (less than $0.5/hr). I exited it last night (to not continue to incurr costs, because it's not cler at all if you pay for non-usage), started it again this morning, and see this messsage: "Start your pod without GPUs. This is useful for debugging non gpu-related problems or transferring data. If you have a volume configured, it will be retrieved and mounted. The price for this instance is $0.195/hour + disk costs." It gives me a link to the docs, to this:...

Create pod in a specific data-center in Europe with python?

how i found the data_center_id or country_code for creating a pod with python?

Runpod VS-Code Template DNS Resolution Fails

Template was working fine this morning, now within the last hour every pod I try to spin up gets a DNS resolution error and can't generate the vs-code server token to connect to the tunnel...

Why do some pods have a stop button and others only have a terminate button?

I noticed that when I create an on-demand post it sometimes has a stop button and sometimes not. When i attach network storage, there is not stop, only terminate. But without network storage there is a stop button. Is there additional info available on why the stop button is available only sometimes? Without a stop button the POD system software to need to be installed since the only option is to terminate and no pause/stop.
Solution:
When you’re using a network volume, we assume you’ll install and store everything in /workspace, so you technically don’t need to stop the pod separately, terminating it is effectively the same as stopping it.

Spot price seems to be broken with SkyPilot

Hello, out of curiosity we started 2xA40 spot instance and I think the price is broken there. Does anyone have any idea what that price comes from? A40 is around 0.35$/h and here for two on spot I see almost 2$ :D. Btw. we are using skypilot :). I also checked on billing page and it appears runpod is actually billing us 1.96$/h...
Solution:
Quoting from chat:
for spot instances you are able to bid the price
not used SkyPilot so do not know much how it works
for spot instances you are able to bid the price
not used SkyPilot so do not know much how it works
...
No description

Port 3000 not working

I've tried at least half a dozen pods today, but none of them ever get port 3000 active on the container. Container is "ready" and no error showing in the log. This is for the FaceFusion community template, which was working fine yesterday.

Issue with Huggingface dataset not being cached to storage volume

I want to use https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu for a project. I'm trying to download this dataset through the python datasets package. I want this download to be stored on my storage volume. As per the documentation here: https://huggingface.co/docs/datasets/v3.2.0/en/cache#cache-directory , the package offers the option to either set an environment variable or use a function argument to specify the download directory. I've tried both approaches, but whatever i do, the c...
Solution:
If that's the right variable use export command in Linux to set the env variable instead of setting in runpod

Storage on US-KS-1

Hi, is the US-KS-1 pods currently down at the moment ? For some reason, it doesn't seem to show up on the region list for some reason, and my storage disk is located in the US-KS-1 region. Also, is there a way for us to copy our network storage drives to something such as google buckets when it is not connected to a pod ?...
No description

File management in inaccessible folder

So I want to move a file out of the checkpoint folder in ComyUI, how do I do this when the folder isn't accessible?

Is there a limit in the number of threads?

I have pods with different numbers of vcpus. I am running vllm. If I create too many vllm in parallel, I get errors like "can't create thread". Is there a parameter that limits the number of threads per pod?

Error creating temporary lease

Error starting container. Happens repeatedly on this pod. (Host: q2jrr78mge01co) 2024-12-30T17:26:00Z create 60GB volume 2024-12-30T17:26:00Z create container runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 2024-12-30T17:26:00Z error pulling image: Error response from daemon: error creating temporary lease: write /var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db: no space left on device: unknown...

video2x

Anyone know how to run video2x on any of the pods? I've tried the desktop, pytorch, and even docker templates and seem unable to get video2x installed either way. Docker just doesn't start with the video2x image passed to it and it seems not possible to run cmake either....