RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚑|serverless

β›…ο½œpods

two pods disappeared .

I was working under company account and both pods seem to have disappeared now screen is showing "you dont have any active pods". These have been running for a while. Company account email got no message about their termination and deletion, they are simply not there anymore.

Can't connect with my POD

I have this problem that I can't connect with my POD. Can you guys please take a look and tell me what is going on?
No description

Can private images on GHCR be fetched using registry credentials?

I'm trying to create a template using a private package on GHCR, but am not sure if I can use registry credentials to do so. Since GitHub requires using personal access tokens, will I just have to manually pull/deploy in a pod, instead of using a RunPod template?
Solution:
You should be able to create a new registry credential with the following format for github packages

How to Estimate the Survival Time of Spot Instances?

I need some advice on estimating the survival time of RunPod Spot instances. I've noticed that sometimes my Spot instances run for several hours without interruption, while other times they get terminated within minutes. This variability makes it challenging to choose between SPOT and ON-DEMAND.

run a function in a pod

Suppose I had a function to do some computation, and I wanted to run that inside a pod - how would I go about doing that entirely from the python sdk?

Impossible to launch a CPU Pods via API

when I try to launch a CPu pods via APi with it's id it just crash, with the graphql api it say : Pod resumed: { errors: [ {...

pod network down

My pod's network went down a while ago and still isn't back - k3c9sctuperq0u is the ID. Obviously I can't get logs or anything. Is there any way I can see when it might be fixed?

Pod crashing due to low regular RAM?

Hey, I am running ComfyUI and my pod keeps crashing at one point in the workflow, the VRam is only at 70% utilised, but the GPU says 100% Does this mean if I found a different pod with more regular Ram, then I could keep going with the workflow?...
No description

where is the stop icon??

i would like to pause my pod, but i can only terminate it??

wasted all my credits trying to figure out how to actually initialize the GPUs in the pod instamces

I tried everything I can think of. installed all the nvidia drivers--everything I would do normally. Could not get any GPU to show as a device. I tried multiple preconfigured pods that said all ready to go but nothing seemed to work properly.

Multi Node training with torchrun/slurm

Has anyone here ever tried multinode on runpod? I am thinking of setting this up but if people have encountered prohibitive network speeds I do not see a reason to.

How to get Public IP and set symmetrical port mapping on Pod via Python SDK

I have created a pod with python in the following way ```python runpod.api_key = os.getenv("RUNPOD_API_KEY") bot_name = 'Testing Pod Public IP 1'...
No description

πŸ†˜ We've encountered a serious issue with the machines running in our production environment

πŸ†˜ We've encountered a serious issue with the machines running in our production environment on RunPod: the GPU utilization fluctuates wildly, sometimes even dropping to zero, which significantly slows down task execution. Who should I contact?
No description

REST API with Ollama

Hello everyone, I installed ollama and trying to make some request do this API using my pod instance and port and IΒ΄m getting no results or 502. IΒ΄m using this tutorial: https://docs.runpod.io/tutorials/pods/run-ollama...

Can't create pod via graphQL endpoint but works manually

I'm trying to create a new pod using a given template and networkvolume. I can do this using the website just fine however when I try to duplicate the exact same settings using the podRentInterruptable graphQL mutation I'm getting a There are no longer any instances available with the request specifications. Please try again later. error. Here is the mutation: ...

AMD pods don't properly support GPU memory allocation

Hello! I've been trying to build a ROCm/HIP-based package to run on RunPod's ROCm-templated pods (or in a custom-built container/template), and I ran into memory issues that I believe I've tracked down to how RunPod is starting up docker containers. In particular, pinned memory allocation fails with a misleading Error: Failed to allocate pinned memory: out of memory (2). Inspecting the GPU devices shows unusual permissions, e.g.: ``` ls -l /dev/dri/*...

TheBloke/goliath-120b-GPTQ with RunPod Kobold AI United

Hi! I got goliath-120b-GPTQ running with 3 A40. But the text generation speed is extremely slow. What is the best option for GPU config and settings to run this model? Thank you in advance!...

Ollama stoped using GPU

I intalled ollama on pod as usual on 3090, by this tutorial: https://docs.runpod.io/tutorials/pods/run-ollama#step-4-interact-with-ollama-via-http-api. But now everything works very slowly. And GPU Memory Used is always on zero. What can be a reason?

Why all folders and files in workspace folder are lost?

When I was working on the pod, the connection to the pod was lost, all folders and files in workspace folder are lost although I didn't stop the pod.

Unable to upgrade linux kernel version from 5.4.0 to 5.15.0 - RunPod A40 GPU

I'm trying to upgrade my linux kernel from version 5.4.0 to 5.15.0. This is required for me to train deep learning models. Here's what I tried 1. I tried to manually upgrade it with apt command. however I'm still getting the same kernel version...