RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

muddyfootprints.

6/5/2024

Docker Image For RunPod Pytorch 2.0.1 Template

Hello, I'm trying to create a custom template which just adds a daemon to the official RunPod Pytorch 2.0.1 template. How can I find the docker Image that is deployed with this template?...

fireice

6/5/2024

Can I use torch2.3.0 + cuda 11.8 on Runpod?

I want upgrade my touch version to 2.3.0, can it works on Runpod ?

vinh.nguyenphu

6/5/2024

is cuda not working?

It gets stuck here forever... Please help

Solution:

Not sure what caused the problem. Solved it by deploying another instance on the community cloud template: runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04...

GokulaKrishna

6/5/2024

Not able start Nginx

I have logged in via Basic SSH and install nginx but im not able to curl. Please help me resolve this. ```...

Solution:

This is resolved. The template had default nginx.conf was changed and it didn't load the sites-enabled config.

GokulaKrishna

6/5/2024

Not able to ssh via "Overexposed SSH"

I am able to login with the basic SSH but Over exposed asks me for password. This is not working ``` ⬢ ❯ ssh root@xxx -p 13776 -i ~/.ssh/id_ed25519...

Solution:

You can use OhMyRunPod

vinh.nguyenphu

6/5/2024

Can not kill processes

gpu pod - sercure cloud

Solution:

I would just reset the pod to kill the processes

Thalia (HMG)

6/5/2024

container start command

I have created a startup.sh script that I want to use as the start command for my container. The script needs to do two things: Start a Python .py file Keep the container accessible through the web terminal after starting the Python script ...

Solution:

fabulous thank you!

justin

6/4/2024

Too many Open Files Error on CPU Pod - Easy Repro

@flash-singh I think I found an easy repro for the too many open files on CPU Pod: 1) Use the following docker: (you don't necessarily need to do this, it just what I am using for an exact repro) justinwlin/runpod_pod_and_serverless:1.0 ...

Patrick

6/4/2024

recipes

Hello, I've been trying to look up some recipes on https://docs.runpod.io/recipes . However, it seems to be down. Does anyone know anything about it? Thanks a lot!...

Solution:

Most of those have been moved here: https://docs.runpod.io/sdks/graphql/manage-pods...

waspinator

6/4/2024

how do you create a compatible docker file?

I want to run a custom docker file, but I'm not sure how to make one that's compatible. for example when I use this to create an image that's saved to my registry, the pod seems to start but I can't connect to it over ssh. I noticed that if I picked an official pytorch pod I had checkmarks for ssh and jupter lab, but not if I use my custom one. What's the minimal dockerfile I need to run? ```dockerfile...

ethan

6/4/2024

Strange unix and/or user perms issue with command in dockerfile/replacement command

I have a bash script in my pod which, as part of its last command, executes mpirun with some target process. When running this command using bash <script> as the dockerfile's entrypoint, or using runpod's replacement command, the following issue occurs: ```2024-06-04T00:27:41.763661289Z Per request, Open MPI attempted to set a system resource 2024-06-04T00:27:41.763672184Z limit to a given value: 2024-06-04T00:27:41.763682241Z ...

fdoelker

6/3/2024

Console for kohya_ss / Stable Diffusion

Is there a way to access the concole for running processes in prebuilt pods? I am running kohya_ss and Stable Diffusion and would like to see what’s going on “behind the WebUI layer”. Any help is greatly apprechiated. 🙂

key8962

6/3/2024

NVLink support for H100 NVL

When I execute the nvidia-smi topo -m method on the H100 NVL * 2 pod, I can see the PIX topology between GPU0 and GPU1. Can I use NVLink connection to interconnect the H100 NVL GPUs? How does the PIX(PCIe bridge) performance differ from NVLink?

Alex (KMI)

6/1/2024

question

Hello, we have a scheduled downtime to remove a machine and reinstall the entire operating system, and I see that there is a process running on it. I'm not sure what to do if I format the machine and reinstall the operating system. But of course, the running process will lose all data.

houmie

5/31/2024

How do I raise a support ticket?

I cannot interact with the Email Support button on the website, and I have received no response on Discord either. I submitted feedback a week ago here: https://discord.com/channels/912829806415085598/1243604870074732595 We are scheduled to go live in about a week, and the general lack of support is very concerning....

Raios

5/30/2024

Cloud Files Updating Backblaze

After I upload my files to Backblaze and I decide later to add some more stuff to the workspace is there a way to update the Backblaze cloud with only the new files without deleting and reuploading them?

Solution:

Re backup them or upload them manually works

Raios

5/30/2024

Pod GPU keeps disconnecting...

i create a pod and when i finish my work the next time i open it the gpu is not available and i have to reinstall from the beginning the whole Fooocus and loss all my downloaded checkpoints and stuff... is there a way to fix this by having my files stored somewhere safe and just connect them with the pod? and how should i do that? please be as specific as possible im beginner.

ethan

5/28/2024

Container Files Missing in Workspace On Pod Launch

When launching pods (a40) on both community and server cloud, using a custom image that populates /workspace as a volume, the expected files and directories don't show up. This worked as of last Friday, and the image has not changes on its github container repo. There is more than enough space on both the network and disk volumes to contain these files

Asad Cognify

5/28/2024

Start the pod with a custom command after the pod finishes startup

How can add I command that the pod will execute after it has finished starting up? I tried using

bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'

bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'

And replaced sleep infinity with apt install nano -y...

Solution:

You can try the other way around:

apt update && apt -y install nano && /start.sh

apt update && apt -y install nano && /start.sh

...

Soomdong

5/28/2024

How do I upload 5 gb file and use it in my pod?

I tried to upload it several times but only 1.2 gb is uploaded (error). (I used Jupytor web based interface). I purchased a storage of 150GB, and made a new pod. But the issue is still the same. ...

Solution:

OhMyRunPod --setup-ssh

Previous Next

Gaming

Programming

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!