RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Changed Log output on the Runpod website

we are using FastAPI in one of our applications on your run pods. Since a couple of days the FastAPI log output is not displayed on the website's log window. In order to see the log output I have to start FastAPI via terminal now. Have there been recent changes to the way logfiles are displayed on the runport website?...

How do I find my network volume with runpodctl?

How do I find my network volume with runpodctl?

network outage pls fix to it

my pod is not works pls fix to it

Cannot see logs on my pods

I can only see queue time but cannot see logs on my pods. is this issue faced by anyone else as well

Storage Pricing

How is storage pricing calculated? Is it per month altogether or same like pods per minute or maybe per day?

Any network issues in EU-RO-1?

My git clone is running at 32KiB/s and I can't copy from s3 (its very slow). Also apt-get is slow. (same speed as git). But downloading files seems to work as expected (got 33MiB/s)...

I'm seeing 93% GPU Memory Used even in a freshly restarted pod.

Not sure what to do about this. nvidia-smi shows there are no processes running, but when I try to run a job it shows "Process 1726743 has 42.25 GiB memory in use". How do I find and kill that?
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 44.52 GiB of which 18.44 MiB is free. Process 1726743 has 42.25 GiB memory in use. Process 3814980 has 2.23 GiB memory in use. Of the allocated memory 1.77 GiB is allocated by PyTorch, and 53.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB. GPU 0 has a total capacity of 44.52 GiB of which 18.44 MiB is free. Process 1726743 has 42.25 GiB memory in use. Process 3814980 has 2.23 GiB memory in use. Of the allocated memory 1.77 GiB is allocated by PyTorch, and 53.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
...

Persistance in pod logs from my training

I started my pod instance, associated with a volume where my dataset is located and cloned my repository through github using VS code integration. I left from home and my laptop went to sleep mode. When I come back, my training was stopped and session disconected

Custom template

Hi there! I'm trying to make my custom CPU docker-based template, but something wrong Locally the image starts well and I don't have any problems, but the same image can't run like pods I'm wondering what I'm doing wrong, because it is really simple app ...

Help Request: ODM Container Only Using CPU

Has anyone tried to deploy an ODM processing node using a pod before? https://github.com/OpenDroneMap/NodeODM How do I add the --gpus all to the pod?...

GraphQL Schema

Hi there, is it possible to get RunPod's GraphQL Schema or enable introspection? I need it for an integration I'm currently working on. 🙂...
Solution:
nope

How saving plan work ?

Could someone clarify how saving plans work? The documentation is quite limited. I understand that it helps reduce costs over a set period, but I'd like to know if, when I get a saving plan for a pod, it guarantees access to the same GPU for the entire reservation duration. If I stop my pod for some reason, do I have to rebuild it, or can I simply restart it?...

502

Hello we are having a trouble with 502 error we are running a comfyUI with runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 our port 8188 is still running and we also can send a get api to 8188 port...
No description

Decommissioning on November 7th

I received this email: "We are reaching out because you currently have serverless workers or pods running in the EUR-NO-1 data center, which is scheduled for decommissioning on November 7th. This change is part of our efforts to upgrade capacity, enhance the network, and improve other infrastructure." What actions should I take if I'm currently running a pod with a savings plan? How I restore a pod with the same savings plan?...

Lost my GPU

Hello, I stopped my pod and when I came back, I have 0 GPUs available. Should I hope that this machine can get the GPU back, or it will never get it back and I should switch to a new pod?...

Where are default models mounted? I can't find them under /comfy-models

```root@054f3147d5b1:/# ls -al /comfy-models/ total 4 drwxr-xr-x 2 root root 10 Oct 25 09:17 . drwxr-xr-x 1 root root 4096 Nov 4 10:00 .. root@054f3147d5b1:/workspace/ComfyUI/custom_nodes/comfyui_controlnet_aux# df -h...

Port forwarding understanding

Greetings, I have been a user of vast ai, and there they have a list of ports alreadt assigned to it and they map to exactly same one on your machine. But in runpod they map to a different one. I have to run a miner and I need to tell two of my ports to it, now should I be telling it my external or internal ports and how would they map to internal ones? I am also attaching picture of vast ports and yours as well...
No description

Problems starting my pod with and without GPU.

Container LOGs (ID: tb7bqtktnwh9gy) 2024-11-02T18:47:01.634671114Z [SSH] Configuring SSH to allow root login with a password... 2024-11-02T18:47:01.720536800Z * Starting periodic command scheduler cron 2024-11-02T18:47:01.809391559Z ...done. 2024-11-02T18:47:01.926771417Z * Restarting OpenBSD Secure Shell server sshd...