RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Community pod: very bad download speed from github.

I started experiencing yesterday very slow download speeds from github (cloning repos), but downloading from other sources works ok. It still happens today. Do you have any idea? I am using the web terminal.
No description

Skypilot + Runpod: No resource satisfying the request

Hi team. I'm trying to use Skypilot + vllm+ Runpod to serve a custom trained LLM. I cannot make the skypilot to launch a resource. I get the following error: I 02-22 00:16:32 optimizer.py:1206] No resource satisfying <Cloud>({'NVIDIA RTX A6000': 1}, ports=['8888']) on RunPod. sky.exceptions.ResourcesUnavailableError: Catalog does not contain any instances satisfying the request: ...

`runpodctl stop pod $RUNPOD_POD_ID` failing with 401

I used to end my long running jobs with this command. has failed last several times with 401. runpodctl stop pod $RUNPOD_POD_ID Error: statuscode 401...

Stuck pod instance

I have a problem with community pod (id: xbyhioflerw8pz), it is not accessible for a really long time and stuck at launch. It infinitely tries to deploy without updating the status, only shows "Waiting for logs". Live chat support is silent. I appreciate any help.

Start container pod error

PodID: iovxdnrsop9fz1 Region: NO Error log message: 2024-02-21T09:47:48Z error starting container: Error response from daemon: driver failed programming external connectivity on endpoint iovxdnrsop9fz1-0 (7047bce4c334cf194763afae5e0b7e6f1ce041721666df277facb429a82f9d9b): Error starting userland proxy: listen tcp4 0.0.0.0:40448: bind: address already in use...

Pod doesn't recognize my SSH key

Hi, my pod crashed and after restart it doesn't let me connect to it with my SSH key which is set in runpod settings. I can connect to ther pods without problems with my private key. Web terminal works. Restart of pod doesn't help. I need to download a log file from this pod before I destroy it. Can please someone help?
Solution:
Yes, you can use runpodctl, croc, etc.

Run Lorax on Runpod (Serverless)

I created a docker image similar to (https://github.com/runpod-workers/worker-tgi/blob/main/src/entrypoint.sh) for Lorax, but inside of the docker image I am getting connection refused: could you please check it?...

What is the difference between secure cloud and Community Cloud?

What is the difference between secure cloud and Community Cloud?

Urgent Prod Issue

Pod is stuck, and not restarting

cuda version filter

Hi team, wanted to check is the cuda filter on the page the minimum version that the hardware supports? Or does the cuda filter mean the only cuda version that the hardware supports? I was trying to run 2xL40 but had a weird cuda index assertion error whereas the exact same code ran fine on 4090's, hence that got me wondering if L40's only permitted cuda 12.0 (based on the page)
Solution:
So if a machine is 12.2, it supports images with 12.1, 11.8 and so on

Maximum length for value of environment variables

As I set some environment variables via the GraphQL API while starting a pod, I was wondering what the maximum length restriction is. The GraphQL API spec is only mentioning that it should be an UTF-8 String.

Enquiry about pod ID oi3rnyumuzvp2s

Hello, is it possible to search the history for a pod ID? We can not see anything in the audit log and the feedback is that the pod has somehow vanished. Can we please check oi3rnyumuzvp2s. Thank you....

GraphQL Cuda Version

How do I make a GPU pod through graphql with a specified cuda version? https://graphql-spec.runpod.io/#definition-PodFilter I assume is possible since runpod has it implemented but is the docs up to date?...

Any template with python 3.9.* or how to install it

I wanna install https://www.kernl.ai/how-to-guides/get-started/#optimize-a-model and it required python 3.9.* is it possible to install this version of python somehow or there is an template with this python version?

Match IPs with GPUs

Hi all, can someone from runpod share which GPUs correspond with IP addresses? I could just try all of them manually of course but this would be a great help! 🙂 (yes I realize its a datacenter that has multiple kinds at the same time) Addresses in question are: 64.247.xxx 91.199.xxx...

Container is not running error

I am having an issue I can't figure out how to work around it. I am new to RunPod so please excuse the limited knowledge at this point. I have my pod running (trying to finetune Mistral 7b) and have my SSH pub key configured under settings (can also see it when launching the pod). But when the Pod is ready and I attempt to connect/ssh into it using my private key, I get the same error everytime "Error response from daemon: Container 7b7a3790f1500c544348d2c6e09c286ee3fe3849adcb241ac54bceb3c518619f is not running"....even though I can see the the container running on the Container Logs.
Clicking on the "Start Web Terminal" on the UI doesnt do anything either.... I have restarted/terminated the Pod multiple times...but no luck...
Solution:
Thank you guys. The article and the questions sent me in the right direction. I am using a custom template and didnt realize the image "winglian/axolotl:main-py3.10-cu118-2.0.1" didnt have SSH installed. I updated the template with "winglian/axolotl-runpod:main-latest" and all is working now. I am finetuning the model as I write this. Thanks for all the help

Pod stopped on restarting no data

Hello there, please help , my account accidentaly ran out of funds and my pod was stopped I received an email to recharge within two days and recharged promptly . On restarting the pod I can't find my data, it is imperative I access that data, please help!

Zero GPU issue

I wanted to start up a SD Comfy UI pod I created the other day and when I start it up I get a pop-up with the message stating that I don't have access to any GPUs and that I should consider creating a network volume. I click the link to learn more which goes to this page https://docs.runpod.io/references/faq?_gl=1*lokxwm*_ga*OTAwNDMyOTA4LjE3MDY5MDYzNDc.*_ga_KMF5V28LQG*MTcwODM0NDg1Ni4xMi4xLjE3MDgzNDQ4NTYuNjAuMC4yMDg3Nzg5MzIz*_gcl_au*MjA5MjM3NzI3Ny4xNzA2OTA2MzQ3#why-do-i-have-zero-gpus-assigned-to-my-pod And in that section there is a link to Learn how to use them (network volumes) and it links to a page with out any tutorial on how to set one up. Here is that link https://docs.runpod.io/pods/network-storage/create-network-volumes ...

Start and stop multiple pods

I have a product that will allow users to submit video editing requests that can range anywhere from 0-8 minutes of RTX 4090 GPU processing each to complete. To manage the multiple requests, I wanted to implement a system that turns on and off a group of GPUs all running the same docker image. This way if requests are high at a given time they could all still be handled. However in my experience, when pods are stopped, it can be the case that the GPU attached to it is no longer available when I...

`runpodctl send` crawling at <1MB speeds

Hi there! I'm a big fan of RunPod for training SDXL, and have spent a bunch of time (and money!) iterating on fine-tuning models on RunPod using on-demand secure cloud servers. However, I keep running into a blocker: unexpectedly slow speeds with runpodctl send. Sometimes it works well, with 40MB/s speeds; other times, it drops down to <1MB/s speeds for no apparent reason, and can take hours to download a single 6GB file. I'll be honest: paying $4.69/hr for 3-4 hours to train a model is much less appealing when I know it might take me another hour of frustration afterwards just to download the results. Is there a faster / more reliable way to download the results of a training?...