Help with constantly crashing GPU pods
Hello, I’ve been struggling for the past few days with trying to get a docker image up and running on a GPU pod. I had success with a template I made (docker image mcgillrobotics/mujoco:cuda118) and managed to connect and get things running, but since then I have not been able to successfully connect to a pod. The docker image pulls, but when I click “Connect to web terminal” nothing happens. When I try to SSH it says the container is not running and kicks me out instantly. I’ve tried different rocket images, different CUDA versions, GPUs, template overrides but have had no luck. I reached out to support on the website and was told I would receive an email, but it’s been a few days and I’ve heard nothing. Would be super grateful if anyone has any input!
4 Replies
Sounds like your Docker image isn't keeping the container alive
What do you mean “keeping it alive”? Is there something special I need to do so the container doesn’t die? Or a crash or something?
Solution
Yes you need to add
sleep infinity
gotcha, thanks!