RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

โšก๏ฝœserverless

โ›…๏ฝœpods

Expected all tensors to be on the same device

Hey there, I'm creating pods to use Stable Diffusion WebUI, using the tempates offered by RunPod. In both templates, if I use any model that's not default, I get the following error:...

Urgent: Workspace Disconnected

Hey, I have a pod running with some data. Suddenly, the workspace has got disconnected and I am not able to connect back to it. If I'll stop the pod, the data will be lost. ...
Solution:
Your data in /workspace doesn't get lost if your stop the pod. The data in /workspace is using persistent storage unless you created a pod without any persistent storage. Therefore it should be safe to stop the pod.

Speedtest for slow pod

As discussed in the last slow download speed thread I've run the speed test script from @justin to maybe help you guys have a look what's going on. I am most often having issues with uploading, as the speed always falls below 1 mbit. I've noticed that restarting the upload (I'm normally using runpodctl btw.) does help, but most often for maybe a minute. Right now I'm trying SFTP, it does help a bit and is around 3-4 MiB/s for uploading. Runpodctl upload was again below 1 MiB/s. Screenshot attached from Pod overview, showing Community Cloud & CZ region. Summary of speed test script attached as txt....
Solution:
It's not impossible, but that would be extremely unlikely to be honest; I haven't seen that happening in the past. - On Secure Cloud: the speed is symmetrical per server. So other servers usage do not affect the bandwidth of others. - On Community Cloud: often, there is a backbone for all servers at a specific data center. That means higher utilization of the network has an impact on all servers all at once....
No description

TCP Port Not Working

I started a runpod instance and specifed TCP port, I got something like this Public IP: <iphere> Internal: 22 External: 35292 Internal: 35293 External: 35293...
Solution:
The issue here is that you need to bind to 0.0.0.0 and not the default of 127.0.0.1 to access the public IP.

Can't login

I haven't been able to log into my account for a few days. It says that a verification code was sent to my email address, but there is no email from RunPod, even in my spam folder.
No description

Stable Diffusion GPU Pod and API

Is there a way to connect a GPU Pod running the stable diffusion template to an API layer that is externally exposed.? I have a serverless instance running @ashleyk 's docker which is working great and much appreciated, albeit 10x slower than the GPU Pods. I am attempting to leverage the processing power and number of GPUs on the pod side -- but need an API endpoint that I can expose to my external app......

Horrible network speeds make the pod unusable.

This happens regularly. there where several posts about this a few months back. In one it was suggested to use cloudfare as a workarround and that worked flawlessly. The days that the network speeds where terrible, i just installed cloudfare and went with it. Today cloudfare is not working for some reason or another. ...

How can I deploy Mixtral using Ollama as service?

Hi everyone! I want deploy mixtral 7x8b model using ollama on runpod, but I can't install it as service using runpod desktop template. Plz help me!...
Solution:
Answered - in #general . Run the install script, ollama serve in one terminal, ollama run [model name] in a new terminal...

520: Web server is returning an unknown error

I'm getting a persistent error trying to connect to our API: <title>PODID-XXXX.proxy.runpod.net | 520: Web server is returning an unknown error</title> It seems to be a cloudflare error. This is not happening if I browse to the endpoint in Chrome, but it's happening through our application or Postman with almost every request. What's going on here? What can we do to fix it? ...
Solution:
maybe pass it through in the body instead then?

Driver mismatch

I've created a pod with NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 NVML library version: 535.154: I'm getting driver mismatch when I try to run nvidia-smi. For example, with NVIDIA RTX A4500. Any ideas?...

Servers' availability: "Any" region vs Specific regions

Hi, I'm trying to rent a server with H100 in secure cloud. When I choose "Any" option in the region, it says that there are some available servers and I can deploy them. However, when I go through each of the regions one by one, it always says that the configuration is not available. So does it mean that that available H100 when I choose "Any" region is in a region that is not listed or there is a bug when I choose specific regions which causes such behavior?...

File copying does not occur in Custom Template

I have been successfully using runpod with a custom template until recently when it stopped working properly. Upon investigation, I discovered an issue where the project code was not being copied correctly into the pod. To identify the bug, I created a very simple image and attempted to copy test.txt as an experiment, but the file was not copied into the pod. On my personal GPU server, the test Dockerfile worked perfectly, and test.txt was copied successfully....

Having trouble with Serverless SD XL image

There is no image that gets generated and I only get a success notification and no error. Unable to debug. Unlike the SD template, there is no image { "delayTime": 20641, "executionTime": 1050,...

What does "Low Availability" mean?

Some instances have the tag of "Low Availability". I'm wondering what does that exactly entail?
Solution:
It means what it says, limited supply of that particular resource type.

Network bandwidth?

What is the network bandwidth of your secure GPU servers? I'm looking for 1 Gbps.

Docker In Docker custom image for GPU pods and Presistant or Network volume support in CPU Pods?

Hey, was anyone able to run docker in docker using CPU or GPU pods? I want to build a large docker image using RunPod's CPU pods but they do not allow network volume or storage over 20 GB The "Runpod Ubuntu" image allows me to run docker in docker as it somehow has support for iptables (I need to run some commands first) but I run out of storage (due to the 20 GB limit) I tried to create custom docker image for GPU pods (because they allow presistant storage) based on ubuntu to enable docker in docker but even with identical commands I can not get iptables to work in my custom image I also could not find the "Runpod Ubuntu" image anywhere on github....

getting ECONNREFUSED while trying to communicate on exposed tcp port with comfyUI API.

That's basically the situation, I'm using raw ip like http:// 999.9999.999:99999 and it is getting refused. the port and ip are from the connect modal. Do I need to whitelist my ip? Do I need public key? I'm not trying to ssh over it, I'm trying to communicate with comfy.

How expose a tcp port without losing the pod data?

Pretty dumb move from my side, any way to save a day of work?

Error connecting to runpod

I log in using ssh on terminal, but get logged out straight again.
No description

Transfering files to new Pod

hi, is it possible to copy data from one Pod to another? When I have 1 GPU Pod where isn't any free GPU at the moment, can I start a new pod and transfer data from the original one? I run ComfyUI on that, so it would be nice to transfer all my models, custom nodes, workflows etc.