RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

no cuda gpu detected

I dont know if this is a general problem or not import torch torch.cuda.is_available() gives error might not have cuda gpu ...
Solution:
When deploying a pod click filters then you will get will get these options. There is a pull down for CUDA Versions.

any way to control the restart policy of pods?

by default, it seems like runpod always restarts the pod after any termination. I am wondering whether there is a flag or other option to control the restart policy. for instance, K8s have the following restart policy: Container restart policy...

how can i deploy an instance with 4070, 4080 gpu?

when i deploy i only see 4090 and 3090 and other datacentre gpu and not RTX series gpus?

CAP_SYS_ADMIN privileges inside container

I am using a pytorch template and profiling some CUDA kernels. For the profiler to work inside the container, I need the container to be run with the --cap-add=CAP_SYS_ADMIN flag to docker run, as far as I can tell the runpod platform does not offer control over the flags passed to docker run. Is there any way around this issue? inside the container I see:...

RunPod SD InvokeAI v3.3.0 Unable to import a model

I used this template a year ago and importing a model was easy enough. just copy the download url and paste it into the importer to add it. but today i tried it again and it always says undefined no matter what model i pick. i tried the URLs form both Civit ai and huggingface but still it says undefined.

Unable to restart pod

When restarting, the error log is as follows: 2024-10-01T09:17:54Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 2024-10-01T09:17:54Z 2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04 Pulling from runpod/pytorch...
No description

connection refused SSH over exposed TCP

Everytime I am trying to connect via ssh over exposedd TCP I get a connection refused while I can connect normally usign the basic ssh terminal with no support for scp & sftp. WHich seems strange to me. I appreciate your help 🙂

Support for terminating pods via SkyPilot

Hi, I want to let my training runs go overnight and to terminate the pod once they are finished training. To do this, I am currently using SkyPilot. Whenever I try and stop a pod via SkyPilot, I get an error similar to Stopping is currently not supported for RunPod. Can RunPod please support this feature?

rsync does not work

Hi, I am running a docker container in the cloud. Everything is working fine so far, I can connect via ssh with my public keys, everything is great. Except that I can't tranfer files using rsync. Everytime I am trying to transfer files via rsync I am getting asked for a password which I have never set. Does anybody has a solution for me? Examples to connect I already tried (The paths are changed): - rsync -avz -e "ssh" ~/documents/example.txt [email protected]:/root/example.txt - rsync -avz -e "ssh -i /path/to/key" ~/documents/example.txt [email protected]:/root/example.txt...
Solution:
I just made it. I used a custom docker container. After readinghttps://blog.runpod.io/how-to-achieve-true-ssh-on-runpod/ I was able to solve it myself.

Build a docker compose yml file

I just made a runpod environment with gpu rtx 4090 and i have a github repository with yml file. I would like to run "docker-compose build" but I can't install docker properly in runpod environment any suggestions or helps please

Unable to Type into Terminal

Total newbie here. Am I losing it, or is there a reason I can't type at all into terminal? It just flashes where I would type. Thank you
No description

Unable to Open or Delete a Folder

Hello. I am attempting to open a folder and nothing is happening when I attempt to open it. If I attempt to delete it, the attached error message is generated. This is for this workflow: https://civitai.com/models/790080/inpainting-simple-workflow-flux-or-upscale-or-lora-or-gguf. The workflow is also generating errors as if I don't have the flux 1-dev-q8_0.gguf file, but I'm not sure if I do, since I can't access the folder in question. Any tips? Thank you!...
No description

Can create a Pod with an A1111 template

I've tried to create a pod with the "runpod/a1111:1.10.0.post7" template but it doesn't do anything. It stays on this screen:
No description

Error while deserializing header: MetadataIncompleteBuffer

Getting the following error trying to run my own LoRA model for the first time and wondered if anyone can help. It's for the model "ComfyUI with Flux.1 dev one-click". I have my own LoRA's that I used jupyter to put into the files. The models are visible in the workflow, but immediately gives the attached error when I start to generate. Any advice? Thank you! Error:...

File transfer on filezilla is very very slow

I've installed speedtest-cli and while having a 5gbps up and down link on the Pod, when I upload files it's very slow and extremely inconsistent going from 25mbps all the way down to 100kbps. My internet connection is 1Gbps up and down.
No description

How to create a community pod using an API, specifying the network quality?

Some of the community pods have very slow network. Using the website, you can filter by network quality (see screenshot). How can I create a pod with a specified network quality using an API? runpod-python doesn't seem to have this parameter. Does Graph QL? Another one? Thank you in advance...
No description

urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

This failed drammatically. I installed all packages and now it's like this.

Urgent: {'message': 'Something went wrong. Please try again later or contact support.'}

We have been encountering this API error every day for about 3 days (usually 6:00-12:00, so 6 hours a day). Could you please check if the error is on our side or yours? Timestamps of api errors that might be useful: 2024-09-26T07:15:01.021521866Z 2024-09-26T07:15:00.95935792Z 2024-09-26T07:08:59.823314972Z...