Docker Image For RunPod Pytorch 2.0.1 Template
Hello,
I'm trying to create a custom template which just adds a daemon to the official RunPod Pytorch 2.0.1 template.
How can I find the docker Image that is deployed with this template?...
Can I use torch2.3.0 + cuda 11.8 on Runpod?
I want upgrade my touch version to 2.3.0, can it works on Runpod ?
is cuda not working?
It gets stuck here forever... Please help
Solution:
Not sure what caused the problem.
Solved it by deploying another instance on the community cloud
template: runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04...
Not able start Nginx
I have logged in via Basic SSH and install nginx but im not able to curl.
Please help me resolve this.
```...
Solution:
This is resolved. The template had default
nginx.conf
was changed and it didn't load the sites-enabled
config.Not able to ssh via "Overexposed SSH"
I am able to login with the basic SSH but Over exposed asks me for password.
This is not working
```
⬢ ❯ ssh root@xxx -p 13776 -i ~/.ssh/id_ed25519...
Solution:
You can use OhMyRunPod
Can not kill processes
gpu pod - sercure cloud
Solution:
I would just reset the pod to kill the processes
container start command
I have created a startup.sh script that I want to use as the start command for my container. The script needs to do two things:
Start a Python .py file
Keep the container accessible through the web terminal after starting the Python script
...
Solution:
fabulous thank you!
Too many Open Files Error on CPU Pod - Easy Repro
@flash-singh
I think I found an easy repro for the too many open files on CPU Pod:
1) Use the following docker: (you don't necessarily need to do this, it just what I am using for an exact repro)
justinwlin/runpod_pod_and_serverless:1.0
...
recipes
Hello,
I've been trying to look up some recipes on https://docs.runpod.io/recipes . However, it seems to be down. Does anyone know anything about it?
Thanks a lot!...
Solution:
how do you create a compatible docker file?
I want to run a custom docker file, but I'm not sure how to make one that's compatible.
for example when I use this to create an image that's saved to my registry, the pod seems to start but I can't connect to it over ssh. I noticed that if I picked an official pytorch pod I had checkmarks for ssh and jupter lab, but not if I use my custom one. What's the minimal dockerfile I need to run?
```dockerfile...
Strange unix and/or user perms issue with command in dockerfile/replacement command
I have a bash script in my pod which, as part of its last command, executes mpirun with some target process. When running this command using bash <script> as the dockerfile's entrypoint, or using runpod's replacement command, the following issue occurs:
```2024-06-04T00:27:41.763661289Z Per request, Open MPI attempted to set a system resource
2024-06-04T00:27:41.763672184Z limit to a given value:
2024-06-04T00:27:41.763682241Z ...
Console for kohya_ss / Stable Diffusion
Is there a way to access the concole for running processes in prebuilt pods? I am running kohya_ss and Stable Diffusion and would like to see what’s going on “behind the WebUI layer”. Any help is greatly apprechiated. 🙂
NVLink support for H100 NVL
When I execute the
nvidia-smi topo -m
method on the H100 NVL * 2 pod, I can see the PIX topology between GPU0 and GPU1. Can I use NVLink connection to interconnect the H100 NVL GPUs? How does the PIX(PCIe bridge) performance differ from NVLink?question
Hello, we have a scheduled downtime to remove a machine and reinstall the entire operating system, and I see that there is a process running on it. I'm not sure what to do if I format the machine and reinstall the operating system. But of course, the running process will lose all data.
How do I raise a support ticket?
I cannot interact with the Email Support button on the website, and I have received no response on Discord either. I submitted feedback a week ago here: https://discord.com/channels/912829806415085598/1243604870074732595
We are scheduled to go live in about a week, and the general lack of support is very concerning....
Cloud Files Updating Backblaze
After I upload my files to Backblaze and I decide later to add some more stuff to the workspace is there a way to update the Backblaze cloud with only the new files without deleting and reuploading them?
Solution:
Re backup them or upload them manually works
Pod GPU keeps disconnecting...
i create a pod and when i finish my work the next time i open it the gpu is not available and i have to reinstall from the beginning the whole Fooocus and loss all my downloaded checkpoints and stuff... is there a way to fix this by having my files stored somewhere safe and just connect them with the pod? and how should i do that? please be as specific as possible im beginner.
Container Files Missing in Workspace On Pod Launch
When launching pods (a40) on both community and server cloud, using a custom image that populates /workspace as a volume, the expected files and directories don't show up. This worked as of last Friday, and the image has not changes on its github container repo. There is more than enough space on both the network and disk volumes to contain these files
Start the pod with a custom command after the pod finishes startup
How can add I command that the pod will execute after it has finished starting up? I tried using
And replaced
bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'
bash -c 'apt update;DEBIAN_FRONTEND=noninteractive apt-get install openssh-server -y;mkdir -p ~/.ssh;cd $_;chmod 700 ~/.ssh;echo "$PUBLIC_KEY" >> authorized_keys;chmod 700 authorized_keys;service ssh start;sleep infinity'
sleep infinity
with apt install nano -y
...Solution:
You can try the other way around:
...
apt update && apt -y install nano && /start.sh
apt update && apt -y install nano && /start.sh
How do I upload 5 gb file and use it in my pod?
I tried to upload it several times but only 1.2 gb is uploaded (error). (I used Jupytor web based interface).
I purchased a storage of 150GB, and made a new pod.
But the issue is still the same.
...
Solution:
OhMyRunPod --setup-ssh