Web terminal keeps closing connection for no reason
I have an on demand GPU pod deployed and I'm running a shell script that's training a model through the web terminal. Systematically, every roughly 1h40m, the web terminal dies with the message "Connection closed", for seemingly no reason. This is very frustrating as I'm paying for on-demand specifically because I want to be able to leave it training for a long period unattended. What can be done to fix this?
9 Replies
Check your container logs, is the pod constantly restarting?
Doesn't seem like it
pro tip do not use web terminal
What is the better solution?
Ssh proxy, true ssh or use terminal in Jupiter
Right, I saw a several people reporting issues with SSH connections crapping out so I wanted to try another way
also jupyter notebook ran out of memory
Pro tip if you want more stable ssh session use tmux so even if ssh disconnect it will keep things running
thanks for that
do you work for runpod?
Also to make sure I understand correctly, I tmux on my computer or once I'm connected to the pod? Because when I'm connected with ssh I get "bash: tmux: command not found"
Connect to pod then start texym there