RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Pod keeps on restarting when using docker_args

Hi, I am launching a pod using create_pod api. I am providing some bash commands in docker_args. However once those commands' execution completes, it restarts and executes all the commands again. Note that there are no errors. resp = runpod.create_pod(name="generated from script", image_name="runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04", gpu_type_id=args.gpu_type, gpu_count=args.gpu_count, start_ssh=True, volume_in_gb=20,container_disk_in_gb=50,...

CA data center connectivity issue

Hi, Having trouble with any running instance on CA data center downloading models from Cvitiai. Note this is not a authorization issue, the requests time out. The same request works perfectly fine from any other data center. Sample request wget -O epiCRealismXL.safetensors "https://civitai.com/api/download/models/128078?type=Model&format=SafeTensor&size=pruned&fp=fp16" --content-disposition...

How to keep training running after disconnecting through VSCode?

I setup vscode to connect with a pod with this guide: https://blog.runpod.io/how-to-connect-vscode-to-runpod/. I then closed one of my own repo's, and started a training script in the bash terminal. How can i close the remote connection in vscode, but keep training running? At a later point i would like to reconnect through vscode again and check on the progress. I apologise for the extreme noobishness of my question, but i've never worked with ssh or anything remote through vscode. Thanks in advance....
Solution:
It's useful for something like this, basically terminal inside terminal that keeps the process running

Unable to connect to ssh with tcp open, due to it asking for a password. Jupyter not functioning

I triple checked my setup according to the instructions in the Use SSH section of the docs and was unable to find any issues. Furthermore, I am able to ssh in with no problem to the non tcp version. Alternatively, this might be related or it could be an alternative problem, when I try to connect to Jupyter Lab it just gives me a white screen. Jupyter worked fine on all pods until I increased the disk space for the volume from 20gb->40gb. Does anyone know have any idea of what could help?...

Extremely slow network storage

For a couple of weeks now, it's taken >15 minutes (sometimes up to 20) to load a 70B model from network storage in CA-MTL. This is, of course, paid time on the GPU while waiting for it (as well as being quite inconvenient) - it's actually quicker to download the model fresh from an internet server every time rather than waiting for it to load from network storage. Is the "network storage" actually on the same network as the GPU server, or is it on some cloud somewhere? Why is it so slow?

Unable to run jupyter on custom docker image

I am basically trying to build my own custom container and run jupyter lab in it. I've copied the configuration from base rundpod image from github. Everything is fine there. The problem occurs when the pod is booted up as it is supposed to start jupyterlab but we get the jupyter command not found issue which is why port 8888 is not ready. First screenshot is from my local machine where i get into the docker and i see jupyter --help works. Second one is from the runpod instance where it complain...
No description

Open UDP Port

I want to open UDP Port - 5060 and UDP port range 10000-20000. Please help me out.

"Something went wrong!" when trying to sync with Google Cloud Storage

Hello. Just trying to sync my pod's internal volume with google colud storage and getting popup with that error. No matter which data I enter - correct or incorrect, it happens. I'm using community cloud pod with runpod/pytorch:2.0.1-py3.10-rocm5.7-ubuntu22.04 image.

Throttled download speed from container registry while still being billed

Hello, Im trying to run a custom image from ghcr on a pod, which ive done many times. Its taking a long time, likely going to be 45 mins total just to download the image. I'm also noticing that im being billed while I wait for this very slow download speed. Is this intentional? If so it seems like a conflict of interests between runpod and the users....

Repeated RunPod Stripe payment problems

Is there any way to pay RunPod without using Stripe? For the past three days, multiple cards have been rejected without any explanation. This is a recurring issue: a fully funded card is declined with no specific reason provided. Interestingly, the same cards work seamlessly on e.g Anthropic or PayPal, which suggests the issue might stem from RunPod’s Stripe setup. I speculate, being entirely ignorant of the mechanics. Yesterday, after failing to pay directly, I sent the funds via PayPal to a friend in the UK—the card was debited without incident or even an OTP request. However, the UK card was rejected by Stripe. No reasons given despite verification having been given via banking app. The problem appears to lie in the repeated requirement for verification on each payment, including Link and bank-level OTP verification. While such steps might seem bank-related, other services process payments effortlessly after the card is registered. As noted in Anthropic and PayPal examples, you are able to just pay without these additional hurdles. The issue is really quite disruptive, and becomes the more frustrating when one is desperate to work on data locked inside RunPod and there is no explanation or path to resolution....

Unable to Decrease the network volume size

For your safety, you can only increase volume disk. I know the safty issue and i am sure that i dont need that much storage, is there any method can let me decrease the network volume size?...

No datacenter for H200 with Volume?

Hello! In fact, none of the presented regions where I can create my Volume gives me the opportunity to start with the H200. Do you know any solutions? Yesterday I created a pod and specified the volume I needed in the settings (just + button), but the region is displayed as "USA", and today this pod is no longer available for launch 😦...
Solution:
they're new, we are still working out those details and getting storage near them, we wanted to provide access while we add more functionality in Q1
No description

The problem in connecting SwarmUI with RunPod

Since yesterday, I have been fighting with the backend configuration. Claude is trying to help me fix the problem but with no result. I paste log from SwarmUI `2024-12-19 08:51:47.270 [Info] Saving backends......
No description

Multi GPU problem

Hi, how can I evenly distribute workers across multiple GPU? I am trying to get the Stable Diffusion model up, however I am getting an out of memory error as gunicorn is trying to run them on one GPU. How can I solve this problem, given that I need to run all the workers on the same port. Either how can I configure proxying requests inside the pod.
No description

Unable to create pod with GraphQL

Hi I tried to use following command to create a pod to test. ```bash curl --request POST \ --url https://api.runpod.io/graphql \ --header "Authorization: Bearer YOUR_API_KEY" ...

Creating a Pod with dockerArgs and a docker image from a registry that requires auth

I'm trying to create a pod from a template or from a docker image from a docker registry with authentication. I'm using the method podFindAndDeployOnDemand. If I specify a templateId, the pod starts but it seems that the dockerArgs I specify in the API call is ignored and the CMD in the Dockerfile is run instead. ...
Next