R
RunPod•10mo ago
Brian Bullock

My pod has randomly crashed several times today, and received emails of Runpod issues.

Today, my pod has crashed a few times, to the point where I'm receiving emails from Runpod about the issues. How can I fix?
Solution:
@rethinkstudios#001 apt-get install google-perftools...
Jump to solution
39 Replies
Madiator2011
Madiator2011•10mo ago
could you provide some informations
nerdylive
nerdylive•10mo ago
Maybe some logs or screenshot on your pod tab would help
Brian Bullock
Brian BullockOP•10mo ago
No description
Brian Bullock
Brian BullockOP•10mo ago
No description
Brian Bullock
Brian BullockOP•10mo ago
Here's a snapshot of the audit logs, where I'm stopping and starting pods that have disconnected in the middle of processes..
Madiator2011
Madiator2011•10mo ago
@rethinkstudios#001 deleted your post as you leaked your email. You are running comfy from web terminal?
Brian Bullock
Brian BullockOP•10mo ago
Thank you! Didn't know that would be an issue. Yes, I'm launching Comfy either thru Jupiter's terminal or the native terminal and in the middle of creating, I'll get a connection closed, and several hours of work will crash. SUPER frustrating. What can we do to keep a solid connection?
Madiator2011
Madiator2011•10mo ago
use normal ssh and run process in tmux
Brian Bullock
Brian BullockOP•10mo ago
Are there tutorials available on how to set that up? And what's the difference in the user experience?
Madiator2011
Madiator2011•10mo ago
Usually you want to setup ssh keys on your machine and add public key to RunPod settings page.
Brian Bullock
Brian BullockOP•10mo ago
Something like this??
Brian Bullock
Brian BullockOP•10mo ago
RunPod Blog
How to Configure Basic Terminal Access on RunPod
The fastest way to get access to a custom pod is to use our basic terminal access feature. This works with any custom container that you want to run on RunPod, whether or not it has a built in SSH daemon or exposed ports. Do be aware that there are
Brian Bullock
Brian BullockOP•10mo ago
TreeCityWes
YouTube
How To Setup SSH Public/Private Key Pair for Vast.ai and Runpod. Se...
Using WSL or Windows Subsystem for Linux to Setup SSH Public/Private Key Pair for Vast.ai and Runpod. Secure your XenBlocks Cloud Miner! SSH Key Pair Guide: https://github.com/TreeCityWes/VastSSHKeyPair/blob/main/VastSSHKeyPair.md Vast.ai GPU Rental: https://cloud.vast.ai/?ref_id=88736 Xen.game: https://xen.game/treecitywes GDXen: https://w...
nerdylive
nerdylive•10mo ago
Yes, what he did is the same thing generating public key using wsl, setting it into the platform then connecting with ssh
Madiator2011
Madiator2011•10mo ago
Bt you dont need to use wsl as windows has build in ssh client
Brian Bullock
Brian BullockOP•10mo ago
Hey guys... My pod just disconnected again, and I was using SSH via Terminus..
Brian Bullock
Brian BullockOP•10mo ago
No description
digigoblin
digigoblin•10mo ago
Your pod ran out of system memory (RAM not VRAM) and the Linux kernel killed off the process. Your pod was not disconnected.. Try using the filter at the top of the page to ensure that your pod gets more system memory assigend to it. Which template is this by the way? You can load tcmalloc to try to improve memory management, thats what A1111 and Forge do because they ran out of memory when switching out models too frequently.
Brian Bullock
Brian BullockOP•10mo ago
Hmm, are you sure? The template is PyTorch 2.01 on an A40 with 48G of RAM, 48G of VRAM
digigoblin
digigoblin•10mo ago
Yes, that is exactly what your error means, so obviously I am sure, Google it Why do you come here asking for help if you know better than everyone here? And pytorch template does not include libtcmalloc so install it and implement it as I suggested. Without libtcmalloc stable diffusion runs out of memory eventually.
Brian Bullock
Brian BullockOP•10mo ago
I didn't mean any disrespect, and genuinely asked. 🙂
Brian Bullock
Brian BullockOP•10mo ago
GitHub
Possible memory leak with lora usage · Issue #1462 · comfyanonymous...
I use comfy as a backend in my app, and especially after using many loras, the CPU RAM usage gradually climbs. The weird part is that the RAM usage exceeds the total size of all models/loras/vaes etc.
Brian Bullock
Brian BullockOP•10mo ago
And would the command to install be: pip install libtcmalloc-minimal4 TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)" export LD_PRELOAD="${TCMALLOC}"
nerdylive
nerdylive•10mo ago
Yep hep Try it
Brian Bullock
Brian BullockOP•10mo ago
pip install libtcmalloc-minimal4 ERROR: Could not find a version that satisfies the requirement libtcmalloc-minimal4 (from versions: none) ERROR: No matching distribution found for libtcmalloc-minimal4
Brian Bullock
Brian BullockOP•10mo ago
hmm, would I need to specify a version?? from this list? https://launchpad.net/ubuntu/focal/+package/libtcmalloc-minimal4
Launchpad
libtcmalloc-minimal4 : Focal (20.04) : Ubuntu
The gperftools, previously called google-perftools, package contains some utilities to improve and analyze the performance of C++ programs. This is a part of that package, and includes an optimized thread-caching malloc.
nerdylive
nerdylive•10mo ago
Try looking for other scripts like from setup in runpod workers or runpod templates There should be some examples of working tmalloc install I'm not on my pc right now so can't help much sorry
digigoblin
digigoblin•10mo ago
You install it with apt not with pip.
Brian Bullock
Brian BullockOP•10mo ago
Trying this today and will let you know how it goes.. Fingers crossed!
Solution
Madiator2011
Madiator2011•10mo ago
@rethinkstudios#001 apt-get install google-perftools
Madiator2011
Madiator2011•10mo ago
this is correct way to install TCmalloc
nerdylive
nerdylive•10mo ago
oof
Brian Bullock
Brian BullockOP•10mo ago
Hmm.. Got this error when trying to install: Reading package lists... Done Building dependency tree... Done Reading state information... Done E: Unable to locate package google-perftools
Madiator2011
Madiator2011•10mo ago
run first apt update
digigoblin
digigoblin•10mo ago
Don't install that, install libtcmalloc-minimal4 And the google one is called libgoogle-perftools4
apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4
apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4
Brian Bullock
Brian BullockOP•10mo ago
So the final command would look like this?? apt update apt update && apt -y install libtcmalloc-minimal4 libgoogle-perftools4 TCMALLOC="$(ldconfig -p | grep -Po "libtcmalloc.so.\d" | head -n 1)" export LD_PRELOAD="${TCMALLOC}"
digigoblin
digigoblin•10mo ago
You don't need to mess with the environment variable, A1111 handles it for you as long as its just installed.
Brian Bullock
Brian BullockOP•10mo ago
OK. Was using it for Comfy.
digigoblin
digigoblin•10mo ago
Oh sorry my bad, then yeah do that

Did you find this page helpful?