RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Maximum length for value of environment variables

As I set some environment variables via the GraphQL API while starting a pod, I was wondering what the maximum length restriction is. The GraphQL API spec is only mentioning that it should be an UTF-8 String.

Enquiry about pod ID oi3rnyumuzvp2s

Hello, is it possible to search the history for a pod ID? We can not see anything in the audit log and the feedback is that the pod has somehow vanished. Can we please check oi3rnyumuzvp2s. Thank you....

GraphQL Cuda Version

How do I make a GPU pod through graphql with a specified cuda version? https://graphql-spec.runpod.io/#definition-PodFilter I assume is possible since runpod has it implemented but is the docs up to date?...

Any template with python 3.9.* or how to install it

I wanna install https://www.kernl.ai/how-to-guides/get-started/#optimize-a-model and it required python 3.9.* is it possible to install this version of python somehow or there is an template with this python version?

Match IPs with GPUs

Hi all, can someone from runpod share which GPUs correspond with IP addresses? I could just try all of them manually of course but this would be a great help! 🙂 (yes I realize its a datacenter that has multiple kinds at the same time) Addresses in question are: 64.247.xxx 91.199.xxx...

Container is not running error

I am having an issue I can't figure out how to work around it. I am new to RunPod so please excuse the limited knowledge at this point. I have my pod running (trying to finetune Mistral 7b) and have my SSH pub key configured under settings (can also see it when launching the pod). But when the Pod is ready and I attempt to connect/ssh into it using my private key, I get the same error everytime "Error response from daemon: Container 7b7a3790f1500c544348d2c6e09c286ee3fe3849adcb241ac54bceb3c518619f is not running"....even though I can see the the container running on the Container Logs.
Clicking on the "Start Web Terminal" on the UI doesnt do anything either.... I have restarted/terminated the Pod multiple times...but no luck...
Solution:
Thank you guys. The article and the questions sent me in the right direction. I am using a custom template and didnt realize the image "winglian/axolotl:main-py3.10-cu118-2.0.1" didnt have SSH installed. I updated the template with "winglian/axolotl-runpod:main-latest" and all is working now. I am finetuning the model as I write this. Thanks for all the help

Pod stopped on restarting no data

Hello there, please help , my account accidentaly ran out of funds and my pod was stopped I received an email to recharge within two days and recharged promptly . On restarting the pod I can't find my data, it is imperative I access that data, please help!

Zero GPU issue

I wanted to start up a SD Comfy UI pod I created the other day and when I start it up I get a pop-up with the message stating that I don't have access to any GPUs and that I should consider creating a network volume. I click the link to learn more which goes to this page https://docs.runpod.io/references/faq?_gl=1*lokxwm*_ga*OTAwNDMyOTA4LjE3MDY5MDYzNDc.*_ga_KMF5V28LQG*MTcwODM0NDg1Ni4xMi4xLjE3MDgzNDQ4NTYuNjAuMC4yMDg3Nzg5MzIz*_gcl_au*MjA5MjM3NzI3Ny4xNzA2OTA2MzQ3#why-do-i-have-zero-gpus-assigned-to-my-pod And in that section there is a link to Learn how to use them (network volumes) and it links to a page with out any tutorial on how to set one up. Here is that link https://docs.runpod.io/pods/network-storage/create-network-volumes ...

Start and stop multiple pods

I have a product that will allow users to submit video editing requests that can range anywhere from 0-8 minutes of RTX 4090 GPU processing each to complete. To manage the multiple requests, I wanted to implement a system that turns on and off a group of GPUs all running the same docker image. This way if requests are high at a given time they could all still be handled. However in my experience, when pods are stopped, it can be the case that the GPU attached to it is no longer available when I...

`runpodctl send` crawling at <1MB speeds

Hi there! I'm a big fan of RunPod for training SDXL, and have spent a bunch of time (and money!) iterating on fine-tuning models on RunPod using on-demand secure cloud servers. However, I keep running into a blocker: unexpectedly slow speeds with runpodctl send. Sometimes it works well, with 40MB/s speeds; other times, it drops down to <1MB/s speeds for no apparent reason, and can take hours to download a single 6GB file. I'll be honest: paying $4.69/hr for 3-4 hours to train a model is much less appealing when I know it might take me another hour of frustration afterwards just to download the results. Is there a faster / more reliable way to download the results of a training?...

Cannot create pods even there are available gpus

When I query as following, ```graphql query { gpuTypes(input: {id: "NVIDIA L4"}) { id...

Transfer/Duplicate Network Volume

As per title, can I duplicate my network storage or temporarily move it to a different region so that I can have access to different GPUs more frequently? EU-RO-1 hasn't had an A100 available in a couple of days.

screen spot

I am running a script in a screen in a Spot instance. If the instance stops, when it gets available again, is there a way to make the script run again?

/usr/bin/bash: cannot execute binary file

I'm trying to set up custom template with my own container image. When I try to set up SSH according to docs (https://docs.runpod.io/pods/configuration/use-ssh) I get errors on container start: 2024-02-18T17:40:55.341955276Z /usr/bin/bash: /usr/bin/bash: cannot execute binary file 2024-02-18T17:41:11.745017647Z /usr/bin/bash: /usr/bin/bash: cannot execute binary file 2024-02-18T17:41:28.059617609Z /usr/bin/bash: /usr/bin/bash: cannot execute binary file...
Solution:
@Arahizzz remove ENTRYPOINT ["/bin/bash"] from image and rebuild image
No description

sudo missing

Why is sudo not available on my secure runpods? How to get it?
Solution:
You don't need sudo, you are already root

Can I watch system utilization in linux terminal?

These are the values shown in one CPU test. The numbers displayed on the RunPod homepage and Linux terminal appear different even if monitored for a long time. When testing on the A100 server, the opposite was also seen. What numbers should I use as a standard when testing the CPU on a GPU server? And can I know the utilization rate per vCPU core I am using?
No description

Network Storage load issue

My GPU Pod (3090) is not connecting to my network drive correctly (all in CZ zone). It seems like it is loading a default set of data, but my custom data on the network volume is not visible. Trying to reset the pod now, but will probably run into the 'Connection closed' issue on the web terminal... This is becoming more and more frustrating tbh.

How do I edit the pre_start file on a pod and have it persist?

read the title please. Jupyter doesn't persist outside workspace.
Solution:
Thanks @ashleyk I made a new script and call that before I call the start script

Mutli GPU

I was conducting an experiment to run LoRAX (https://github.com/predibase/lorax) on multiple GPUs. However, I did not observe any improvement in the results; in fact, the throughput was even worse. For sequence calls, the throughput for 1x GPU is better than 2x GPU! ...

Unable to use model in stable diffusion

I tried to use a model I downloaded and received this error:
No description