can’t run my own init script
Hi guys, after couple hours I give up and asking for help
I use standard runpod/pytorch image
my initial script is working in digital ocean test instances.
I put this string to docker start override settings
bash -c 'apt update && apt install -y wget && wget -O init-script.sh http://path.txt && chmod +x init-script.sh && ./init-script.sh'
Downloaded script is following
echo "-----BEGIN RSA PRIVATE KEY---.... " > /root/.ssh/githubkey
chmod 600 /root/.ssh/githubkey
apt install screen
eval "$(ssh-agent -s)"
ssh-add /root/.ssh/githubkey
GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no" git clone [email protected]:momentum100/runpod-trainer-deploy.git /root/runpod-trainer-deploy
cd /root/runpod-trainer-deploy
chmod +x start.sh
screen -L -S deploy -dm bash -c './start.sh'
Pod is cloning repository and then something fails and it restarts script again and again. I only can look for logs in web ui. SSH also not working.
Please advice.
Ps in my repository is dataset and 2 scripts to download model and start training.
Also idea is to run a screen for monitoring
7 Replies
Are you using Secure Cloud? I'm only having similar issues on Secure, if I run on Community Cloud, no issues
Yes, using secure cloud because I have storage with runpod. Do you advise to move to Amazon for example + community cloud?
BTW, if you are deploying Private Keys, be sure to create a docker on multi stage creation.
No I don't advise that, I just posted a similar issue asking them to review the Secure Pods
Seems to be Hardware based.
I have keys in vars, should be set with their own /start.sh. But even with it my init script keep restarting
I’m in frustration 🙂
yes me too, for what you are saying, it's a POD problem, similar to mine, try on Community
Just read your topic seems similar. Thanks for reply
@AC_pill managed to solve it adding sleep infinity to my init script. Runpod start has it. Try /start.sh & init.sh adding sleep infinity to the end of your init too
I have infinite set up