Unable to connect to Jupyter lab
Seems like Jupyter lab has crashed on my pod after a job running for around 2 days . This is unfortunate . Is there anyway I can restart jupyter lab so that I can resume training ? Is it also possible that my process may still be running despite Jupyter lab having crashed ?
3 Replies
From container logs:
The Linux kernel killed off the process because the pod ran out of system memory (not VRAM). Your training process would have probably stopped as well when Jupyter was killed. Its probably better to SSH into the pod and start a screen or tmux session for your training rather than trying to run it within Jupyter.
I would just reset the pod to get Jupyter to work again.
Nice one, thanks a lot