Vitali
Services Stopped
Hi team,
Could somebody help me with the issue?
I have my pod running - runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu 4 RTX 4090
To start my AI training program I write commands via command line and the process starts. But then after 1-4 hours, the process stops somehow so that I need to retype all the commands to start the process again.
What may stop the process? Why I need to restart everything 3-4 times per day?
14 replies