Maximum length for value of environment variables
As I set some environment variables via the GraphQL API while starting a pod, I was wondering what the maximum length restriction is. The GraphQL API spec is only mentioning that it should be an UTF-8 String.
Enquiry about pod ID oi3rnyumuzvp2s
Hello, is it possible to search the history for a pod ID? We can not see anything in the audit log and the feedback is that the pod has somehow vanished. Can we please check oi3rnyumuzvp2s.
Thank you....
GraphQL Cuda Version
How do I make a GPU pod through graphql with a specified cuda version?
https://graphql-spec.runpod.io/#definition-PodFilter
I assume is possible since runpod has it implemented but is the docs up to date?...
Any template with python 3.9.* or how to install it
I wanna install https://www.kernl.ai/how-to-guides/get-started/#optimize-a-model and it required python 3.9.* is it possible to install this version of python somehow or there is an template with this python version?
Match IPs with GPUs
Hi all, can someone from runpod share which GPUs correspond with IP addresses? I could just try all of them manually of course but this would be a great help! 🙂 (yes I realize its a datacenter that has multiple kinds at the same time)
Addresses in question are:
64.247.xxx
91.199.xxx...
Container is not running error
I am having an issue I can't figure out how to work around it. I am new to RunPod so please excuse the limited knowledge at this point.
I have my pod running (trying to finetune Mistral 7b) and have my SSH pub key configured under settings (can also see it when launching the pod). But when the Pod is ready and I attempt to connect/ssh into it using my private key, I get the same error everytime "Error response from daemon: Container 7b7a3790f1500c544348d2c6e09c286ee3fe3849adcb241ac54bceb3c518619f is not running"....even though I can see the the container running on the Container Logs.
Clicking on the "Start Web Terminal" on the UI doesnt do anything either.... I have restarted/terminated the Pod multiple times...but no luck...
Clicking on the "Start Web Terminal" on the UI doesnt do anything either.... I have restarted/terminated the Pod multiple times...but no luck...
Solution:
Thank you guys. The article and the questions sent me in the right direction. I am using a custom template and didnt realize the image "winglian/axolotl:main-py3.10-cu118-2.0.1" didnt have SSH installed. I updated the template with "winglian/axolotl-runpod:main-latest" and all is working now. I am finetuning the model as I write this. Thanks for all the help
Pod stopped on restarting no data
Hello there, please help , my account accidentaly ran out of funds and my pod was stopped I received an email to recharge within two days and recharged promptly . On restarting the pod I can't find my data, it is imperative I access that data, please help!
Zero GPU issue
I wanted to start up a SD Comfy UI pod I created the other day and when I start it up I get a pop-up with the message stating that I don't have access to any GPUs and that I should consider creating a network volume. I click the link to learn more which goes to this page
https://docs.runpod.io/references/faq?_gl=1*lokxwm*_ga*OTAwNDMyOTA4LjE3MDY5MDYzNDc.*_ga_KMF5V28LQG*MTcwODM0NDg1Ni4xMi4xLjE3MDgzNDQ4NTYuNjAuMC4yMDg3Nzg5MzIz*_gcl_au*MjA5MjM3NzI3Ny4xNzA2OTA2MzQ3#why-do-i-have-zero-gpus-assigned-to-my-pod
And in that section there is a link to Learn how to use them (network volumes) and it links to a page with out any tutorial on how to set one up. Here is that link https://docs.runpod.io/pods/network-storage/create-network-volumes
...
Start and stop multiple pods
I have a product that will allow users to submit video editing requests that can range anywhere from 0-8 minutes of RTX 4090 GPU processing each to complete. To manage the multiple requests, I wanted to implement a system that turns on and off a group of GPUs all running the same docker image. This way if requests are high at a given time they could all still be handled. However in my experience, when pods are stopped, it can be the case that the GPU attached to it is no longer available when I...
`runpodctl send` crawling at <1MB speeds
Hi there! I'm a big fan of RunPod for training SDXL, and have spent a bunch of time (and money!) iterating on fine-tuning models on RunPod using on-demand secure cloud servers. However, I keep running into a blocker: unexpectedly slow speeds with
runpodctl send
. Sometimes it works well, with 40MB/s speeds; other times, it drops down to <1MB/s speeds for no apparent reason, and can take hours to download a single 6GB file.
I'll be honest: paying $4.69/hr for 3-4 hours to train a model is much less appealing when I know it might take me another hour of frustration afterwards just to download the results. Is there a faster / more reliable way to download the results of a training?...Cannot create pods even there are available gpus
When I query as following,
```graphql
query {
gpuTypes(input: {id: "NVIDIA L4"}) {
id...
Transfer/Duplicate Network Volume
As per title, can I duplicate my network storage or temporarily move it to a different region so that I can have access to different GPUs more frequently? EU-RO-1 hasn't had an A100 available in a couple of days.
screen spot
I am running a script in a
screen
in a Spot instance. If the instance stops, when it gets available again, is there a way to make the script run again?/usr/bin/bash: cannot execute binary file
I'm trying to set up custom template with my own container image. When I try to set up SSH according to docs (https://docs.runpod.io/pods/configuration/use-ssh) I get errors on container start:
2024-02-18T17:40:55.341955276Z /usr/bin/bash: /usr/bin/bash: cannot execute binary file
2024-02-18T17:41:11.745017647Z /usr/bin/bash: /usr/bin/bash: cannot execute binary file
2024-02-18T17:41:28.059617609Z /usr/bin/bash: /usr/bin/bash: cannot execute binary file...
Solution:
@Arahizzz remove ENTRYPOINT ["/bin/bash"] from image and rebuild image
sudo missing
Why is sudo not available on my secure runpods? How to get it?
Solution:
You don't need sudo, you are already root
Can I watch system utilization in linux terminal?
These are the values shown in one CPU test. The numbers displayed on the RunPod homepage and Linux terminal appear different even if monitored for a long time. When testing on the A100 server, the opposite was also seen. What numbers should I use as a standard when testing the CPU on a GPU server? And can I know the utilization rate per vCPU core I am using?
Network Storage load issue
My GPU Pod (3090) is not connecting to my network drive correctly (all in CZ zone). It seems like it is loading a default set of data, but my custom data on the network volume is not visible. Trying to reset the pod now, but will probably run into the 'Connection closed' issue on the web terminal... This is becoming more and more frustrating tbh.
How do I edit the pre_start file on a pod and have it persist?
read the title please. Jupyter doesn't persist outside workspace.
Solution:
Thanks @ashleyk I made a new script and call that before I call the start script
Mutli GPU
I was conducting an experiment to run LoRAX (https://github.com/predibase/lorax) on multiple GPUs. However, I did not observe any improvement in the results; in fact, the throughput was even worse.
For sequence calls, the throughput for 1x GPU is better than 2x GPU!
...