RunPod•11mo ago

My pod has randomly crashed several times today, and received emails of Runpod issues.

Today, my pod has crashed a few times, to the point where I'm receiving emails from Runpod about the issues. How can I fix?

Solution:

@rethinkstudios#001 apt-get install google-perftools...

Jump to solution

39 Replies

Madiator2011•11mo ago

could you provide some informations

nerdylive•11mo ago

Maybe some logs or screenshot on your pod tab would help

Brian BullockOP•11mo ago

Brian BullockOP•11mo ago

Here's a snapshot of the audit logs, where I'm stopping and starting pods that have disconnected in the middle of processes..

Madiator2011•11mo ago

@rethinkstudios#001 deleted your post as you leaked your email. You are running comfy from web terminal?

Brian BullockOP•11mo ago

Thank you! Didn't know that would be an issue. Yes, I'm launching Comfy either thru Jupiter's terminal or the native terminal and in the middle of creating, I'll get a connection closed, and several hours of work will crash. SUPER frustrating. What can we do to keep a solid connection?

Madiator2011•11mo ago

use normal ssh and run process in tmux

Brian BullockOP•11mo ago

Are there tutorials available on how to set that up? And what's the difference in the user experience?

Madiator2011•11mo ago

Usually you want to setup ssh keys on your machine and add public key to RunPod settings page.

Brian BullockOP•11mo ago

Something like this??

Brian BullockOP•11mo ago

https://blog.runpod.io/how-to-set-up-terminal-access-on-runpod/

RunPod Blog

How to Configure Basic Terminal Access on RunPod

The fastest way to get access to a custom pod is to use our basic terminal access feature. This works with any custom container that you want to run on RunPod, whether or not it has a built in SSH daemon or exposed ports. Do be aware that there are

Brian BullockOP•11mo ago

Or more like this? https://www.youtube.com/watch?v=_qjd6UAHaRg

TreeCityWes

YouTube

How To Setup SSH Public/Private Key Pair for Vast.ai and Runpod. Se...

Using WSL or Windows Subsystem for Linux to Setup SSH Public/Private Key Pair for Vast.ai and Runpod. Secure your XenBlocks Cloud Miner! SSH Key Pair Guide: https://github.com/TreeCityWes/VastSSHKeyPair/blob/main/VastSSHKeyPair.md Vast.ai GPU Rental: https://cloud.vast.ai/?ref_id=88736 Xen.game: https://xen.game/treecitywes GDXen: https://w...

nerdylive•11mo ago

Yes, what he did is the same thing generating public key using wsl, setting it into the platform then connecting with ssh

Madiator2011•11mo ago

Bt you dont need to use wsl as windows has build in ssh client

Brian BullockOP•11mo ago

Hey guys... My pod just disconnected again, and I was using SSH via Terminus..

Brian BullockOP•11mo ago

digigoblin•11mo ago

Your pod ran out of system memory (RAM not VRAM) and the Linux kernel killed off the process. Your pod was not disconnected.. Try using the filter at the top of the page to ensure that your pod gets more system memory assigend to it. Which template is this by the way? You can load tcmalloc to try to improve memory management, thats what A1111 and Forge do because they ran out of memory when switching out models too frequently.

Brian BullockOP•11mo ago

Hmm, are you sure? The template is PyTorch 2.01 on an A40 with 48G of RAM, 48G of VRAM

digigoblin•11mo ago

Yes, that is exactly what your error means, so obviously I am sure, Google it Why do you come here asking for help if you know better than everyone here? And pytorch template does not include libtcmalloc so install it and implement it as I suggested. Without libtcmalloc stable diffusion runs out of memory eventually.

Brian BullockOP•11mo ago

I didn't mean any disrespect, and genuinely asked. 🙂

Brian BullockOP•11mo ago

This is what you mean, yes? https://github.com/comfyanonymous/ComfyUI/issues/1462

GitHub

Possible memory leak with lora usage · Issue #1462 · comfyanonymous...

I use comfy as a backend in my app, and especially after using many loras, the CPU RAM usage gradually climbs. The weird part is that the RAM usage exceeds the total size of all models/loras/vaes etc.

Brian BullockOP•11mo ago

And would the command to install be: pip install libtcmalloc-minimal4 TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)" export LD_PRELOAD="${TCMALLOC}"

nerdylive•11mo ago

Yep hep Try it

Brian BullockOP•11mo ago

pip install libtcmalloc-minimal4 ERROR: Could not find a version that satisfies the requirement libtcmalloc-minimal4 (from versions: none) ERROR: No matching distribution found for libtcmalloc-minimal4

Brian BullockOP•11mo ago

hmm, would I need to specify a version?? from this list? https://launchpad.net/ubuntu/focal/+package/libtcmalloc-minimal4

Launchpad

libtcmalloc-minimal4 : Focal (20.04) : Ubuntu

The gperftools, previously called google-perftools, package contains some utilities to improve and analyze the performance of C++ programs. This is a part of that package, and includes an optimized thread-caching malloc.

nerdylive•11mo ago

Try looking for other scripts like from setup in runpod workers or runpod templates There should be some examples of working tmalloc install I'm not on my pc right now so can't help much sorry

digigoblin•11mo ago

You install it with apt not with pip.

Brian BullockOP•11mo ago

Trying this today and will let you know how it goes.. Fingers crossed!

Solution

Madiator2011•11mo ago

@rethinkstudios#001 apt-get install google-perftools

Madiator2011•11mo ago

this is correct way to install TCmalloc

nerdylive•11mo ago

oof

Brian BullockOP•11mo ago

Hmm.. Got this error when trying to install: Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package google-perftools

Madiator2011•11mo ago

run first apt update

digigoblin•11mo ago

Don't install that, install libtcmalloc-minimal4 And the google one is called libgoogle-perftools4

apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4

apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4

Brian BullockOP•11mo ago

So the final command would look like this?? apt update apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4 TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)" export LD_PRELOAD="${TCMALLOC}"

digigoblin•11mo ago

You don't need to mess with the environment variable, A1111 handles it for you as long as its just installed.

Brian BullockOP•11mo ago

OK. Was using it for Comfy.

digigoblin•11mo ago

Oh sorry my bad, then yeah do that

Gaming

Programming

My pod has randomly crashed several times today, and received emails of Runpod issues.

Did you find this page helpful?