R
RunPod10mo ago
Ercan

GPU Pod was down all the night

Hi, we just woke up to a production issue where our all apis were down because our pods just shut down and looks like restarted for some reason, and when we looked at we sat maintenance scheduled text for next week. Can someone help what was the issue, and why it went down itself ? Pod ID: clxu7lem3ph9xu
13 Replies
Ercan
ErcanOP10mo ago
@Madiator2011 Could you help us on this issue ?
Madiator2011
Madiator201110mo ago
Usually even when pod restarts it should start the last running app automaticly make sure to check pod logs
Ercan
ErcanOP10mo ago
We have 3 different service running in pod, and when it restarted, all had to be restarted The thing is also, we cannot see what happened in the pod, or why it restarted, only thing we see is now "Maintenance Scheduled"
Madiator2011
Madiator201110mo ago
Maintenance means the pod is going to be down for upgrades or fixes
Ercan
ErcanOP10mo ago
What do you suggest we should do in such cases where pod restarts for some reason or machine has problems, and when it restarts, how could we automate all the services to be run back again. Is there any API available on runpod where we can see if pod is down, or active etc or can we trigger something
Madiator2011
Madiator201110mo ago
make bash script to run all services on pod start I'm not sure what are you running so cant tell
Ercan
ErcanOP10mo ago
We are running sd-web-ui for (API), text generation web ui for llm, and our custom fast api service in another port etc
Madiator2011
Madiator201110mo ago
all in single pod?
Ercan
ErcanOP10mo ago
2x4090
Madiator2011
Madiator201110mo ago
I mean single pod with 2x4090 or two pods with single 4090 each
Ercan
ErcanOP10mo ago
single pod with 2x4090
Madiator2011
Madiator201110mo ago
you probably will need to make own custom startup script like this https://github.com/runpod/containers/blob/main/container-template/start.sh
GitHub
containers/container-template/start.sh at main · runpod/containers
🐳 | Dockerfiles for the RunPod container images used for our official templates. - runpod/containers
Ercan
ErcanOP10mo ago
I see, makes sense, I will have a look
Want results from more Discord servers?
Add your server