RunPod•14mo ago

Worker is very frequently killed and replaced

I have an endpoint configured with 1 active worker and 2 max workers (24GB PRO). The requests are being handled by an asynchronous handler. For some unknown reason -- I can't see any errors or other failures in the logs, every 30 min - 2h (some times less, sometimes more), the worker restarts. Same worker (according to the id), but the container is restarted. What could be the reason for this? The system logs look like this: 2024-03-01T17:08:05Z start container 2024-03-01T17:18:07Z stop container 2024-03-01T17:18:08Z remove container 2024-03-01T17:18:08Z remove network 2024-03-01T18:20:22Z create pod network 2024-03-01T18:20:22Z create container XXXXX 2024-03-01T18:20:22Z start container 2024-03-01T18:30:17Z stop container 2024-03-01T18:30:17Z remove container 2024-03-01T18:30:17Z remove network 2024-03-01T18:38:56Z create pod network 2024-03-01T18:38:56Z create container XXXXX 2024-03-01T18:38:56Z start container 2024-03-01T18:57:44Z stop container 2024-03-01T18:57:45Z remove container 2024-03-01T18:57:45Z remove network 2024-03-01T19:04:17Z create pod network 2024-03-01T19:04:17Z create container XXXXXX 2024-03-01T19:04:17Z start container 2024-03-01T19:19:58Z stop container 2024-03-01T19:20:00Z remove container 2024-03-01T19:20:00Z remove network 2024-03-01T19:20:24Z create pod network 2024-03-01T19:20:24Z create container XXXXXXXX 2024-03-01T19:20:26Z start container 2024-03-01T19:21:05Z stop container 2024-03-01T19:21:07Z remove container 2024-03-01T19:21:07Z remove network 2024-03-01T19:21:34Z create pod network 2024-03-01T19:21:34Z create container XXXXXXXX 2024-03-01T19:21:35Z start container

3 Replies

flash-singh•14mo ago

whats the endpoint id?

JorgeGOP•14mo ago

1hdfqkkbw41swp Thanks for looking into it

flash-singh•14mo ago

those logs are normal, it happens when your workers sale up and down

Gaming

Programming

Worker is very frequently killed and replaced

Did you find this page helpful?