any way to control the restart policy of pods?
by default, it seems like runpod always restarts the pod after any termination. I am wondering whether there is a flag or other option to control the restart policy.
for instance, K8s have the following restart policy:
Container restart policy
The spec of a Pod has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always.
The restartPolicy for a Pod applies to app containers in the Pod and to regular init containers. Sidecar containers ignore the Pod-level restartPolicy field: in Kubernetes, a sidecar is defined as an entry inside initContainers that has its container-level restartPolicy set to Always. For init containers that exit with an error, the kubelet restarts the init container if the Pod level restartPolicy is either OnFailure or Always:
Always: Automatically restarts the container after any termination.
OnFailure: Only restarts the container if it exits with an error (non-zero exit status).
Never: Does not automatically restart the terminated container.
When the kubelet is handling container restarts according to the configured restart policy, that only applies to restarts that make replacement containers inside the same Pod and running on the same node. After containers in a Pod exit, the kubelet restarts them with an exponential backoff delay (10s, 20s, 40s, …), that is capped at 300 seconds (5 minutes). Once a container has executed for 10 minutes without any problems, the kubelet resets the restart backoff timer for that container. Sidecar containers and Pod lifecycle explains the behaviour of init containers when specify restartpolicy field on it.
5 Replies
You can manually restart/terminate a container... but yes i guess resourcess get deallocated from the host server when a container is terminated
What are you hoping to do?
Haven't found one. So in order to avoid crash-loops, I just wrap all my containers in an init script that just execs into a "wait" process and launches all its actual work in sub-processes. That way I can see any errors in the logs, debug and fix stuff that is broken without the frigging container vanishing in a puff a smoke a second after an error happens.
The always-restart policy is only really useful for stable production workloads. Not for R&D or experimental setups, which is all I'm using runpod for.
also I don't get it, why would you want to restart a pod after it's "terminated"?
I interpreted OP to mean they want to avoid expensive and unproductive infinite crash-loops. Or perhaps your remark nerdylive, was also questioning the current behavior?
Yeah I'm not sure clearly what the op wants, but from what I know terminating is like deleting a resource so it's gone right