RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

I am having trouble finding the location of the model file when trying to use ComfyUI.

I have edited the 'extra_model_paths.yaml' file, but I still can't seem to find it.
No description

Turn on Confidential Computing

Hi! I created a pod using a H100 and tried to do some tests with Confidential Computing but it turns out CC in fact disabled, is it possible to turn it on? Its absolutly necessary for me to have CC on. Here is a documentation on how to turn CC on: https://docs.nvidia.com/confidential-computing-deployment-guide.pdf...

"SSH Public Keys" in account settings are completely ignored

Hello, I am trying to access the env variables, as well as standard PUBLIC_KEY variable, that I specify for my pod from my python app. However they are only set when I am connecting with ssh via proxy server. Proxy is extremely slow and does not allow scp to be run through it. When I try to connect directly (via public ip), the ~/.ssh/authorized_keys is not configured at all with the public key I set in the settings. The env vars that I pass during the pod creation are also missing. Two problems: - why isn't the ~/.ssh/authorized_keys file created and populated with my public key from account settings - why env variables are missing when connecting directly via public ip to my instance? I assume proxy has some .bashrc which is activated when I connect through it, but why the env vars are not set with -e parameter in docker run command for the pod?...
Solution:
in your running pod xargs -0 -L1 -a /proc/1/environ will list the environment variables that the process is getting, which is launched on container start. if there is a PUBLIC_KEY given to your pod, it will be there. if this process is a bash and doesn't export those variables when starting other processes, it will be the only process who knows about your PUBLIC_KEY

Is there an instance type that cannot be taken from you even if you stop the pod?

I'd like to have the comfort of knowing I can spin up whenever I want to without worrying if my GPU had been taken from me while I wasn't using it
Solution:
Nope you would either need to run pod 24/7 or use network storage

Kill a pod from the inside?

Last weekend I started a community pod for a large workload and went to bed once it confirmed it was starting the work properly. Unfortunately though the pod was on a very slow connection to my cloud storage, and so it spent about 14 out of the 16 hours run time just downloading the job files… I’ve only just realised it after noticing how much faster things went on other runs and analysing my cloud egress logs. I’ve rewritten my code to report current download speeds so I can kill pods by hand, but is there any way to do it from a running python app? Ideally if it detected slow disk or downloads it’s kill itself so that at least I’d know. My alternative is to have it send me a discord message, but that’s not as useful!...
Solution:
Thanks, can se that combined with some pre-set environment variables https://docs.runpod.io/pods/references/environment-variables the podStop command is what I'd need - I didn't realise that the pod knew who t was (so to speak)

Performance A100-SXM4-40GB vs A100-SXM4-80GB

Hello! I have one GPU: NVIDIA A100-SXM4-40GB on Google Colab Pro. I have one GPU: NVIDIA A100-SXM4-80GB on RunPod. My notebook successfully fine-tunes Whisper-Small on Google Colab (40GB) with batch size 32....
Solution:
could be diffrent things like cuda version, python version etc

API problem

After having been away from Runpod for a couple of weeks, I am greeted by a fancy new GUI! But also a problem. (See attached) I tried several different community pods but get the same result. It seems to happen as soon as I try and change the template. Please advise..
Solution:
it's false positive
No description

Why is there no indicators of file transfer operations? Am I supposed to guess when they're done?

Why is there no indicators of file transfer operations? Am I supposed to guess when they're done? No file transfer indicators, no zip extraction process reporting, nothing. Am I supposed to just magically guess when file operations on runpod are done?!

data didn't persist

which folder should I put my files in if I want them to persist across deployments? I've started my pod from Storage > Select existing disk > Deploy I've left my files in /workspace Now that folder is not listed through SSH dir, or with VS Code...
Solution:
oh its /workspace for pods

Tailscale on Pod

Hello, all. I need to set up Tailscale VPN in Pod in order to allow access to our DB. Issues is that /dev/net/tun is not available, and using SOCKS5 proxy as described in this article https://tailscale.com/kb/1112/userspace-networking is not an option for us. Are there any recommendations, how I can run Tailscale? ...

Confidential Computing Support

Hi! I'm looking to do experiments with the H100 and Confidential Computing (CC). I saw on a talk from NVDIA GTC that in order for the H100 to support CC it needs to be running alongside an CPU with support for Virtualized-based TEE ("Confidential VM") some supported CPUs are AMD Milan, or later and Intel SPR and later. Are your H100 running with CPUs that support Confidential VM? Are they in environment suited for Confidential Computing?...

Ollama on RunPod

Hey all, I am attempting to set up Ollama on a Nvidia GeForce RTX 4090 pod. The commands for that are pretty straightforward (link to article: https://docs.runpod.io/tutorials/pods/run-ollama#:~:text=Set%20up%20Ollama%20on%20your%20GPU%20Pod%201,4%3A%20Interact%20with%20Ollama%20via%20HTTP%20API%20). All I do is run the following two commands on the pod's web terminal after it starts up, and I'm good to go: 1) (curl -fsSL https://ollama.com/install.sh | sh && ollama serve > ollama.log 2>&1) & 2) ollama run [model_name]...
No description

Runpod Python API problem while trying to list pods

Hi! Since this morning, I have difficulties to list my pods via runpod.get_pods(). Apart from rare successes, usually an error pops up within graphql.py saying that 'Something went wrong', recommending to try again later. Are there some known issues? I am using Version 1.6.2. Thanks in advance!

Multiple SSH keys via Edit Pod option

I understand I need to separate public keys by newlines. However pasting in SSH keys separated by newlines via Edit Pod-> Environment Variables doesn't seem to allow two people to connect simultaneously. Sorry if this has been answered elsewhere, thanks in advance!...
Solution:
Either that or simply manually add the SSH keys to the authorized_keys file which is much simpler.

L40S aren't available

Hello. On the community cloud, the website shows that L40S are available at price $0.5/hr, but when I'm trying to create a pod, it says that they aren't available.
No description

It is possible to reserve GPUs for use at a later time?

To ensure that there is a GPU available at a planned time of use, is it possible to reserve GPUs?

Is there a way to scale pods?

I would like to scale up number of pods in order to meet demand. Is there a way to do that?

Build with Dockerfile or mount image from tar file

Is there a possibility to build a image from dockerfile trough runpod or mount my tar file?

Performance of Disk vs Network Volume

Is there a significant trade-off in performance between the pod's local volume and a network volume? How should I think about this?