R
RunPod10mo ago
JorgeG

Worker is very frequently killed and replaced

I have an endpoint configured with 1 active worker and 2 max workers (24GB PRO). The requests are being handled by an asynchronous handler. For some unknown reason -- I can't see any errors or other failures in the logs, every 30 min - 2h (some times less, sometimes more), the worker restarts. Same worker (according to the id), but the container is restarted. What could be the reason for this? The system logs look like this: 2024-03-01T17:08:05Z start container 2024-03-01T17:18:07Z stop container 2024-03-01T17:18:08Z remove container 2024-03-01T17:18:08Z remove network 2024-03-01T18:20:22Z create pod network 2024-03-01T18:20:22Z create container XXXXX 2024-03-01T18:20:22Z start container 2024-03-01T18:30:17Z stop container 2024-03-01T18:30:17Z remove container 2024-03-01T18:30:17Z remove network 2024-03-01T18:38:56Z create pod network 2024-03-01T18:38:56Z create container XXXXX 2024-03-01T18:38:56Z start container 2024-03-01T18:57:44Z stop container 2024-03-01T18:57:45Z remove container 2024-03-01T18:57:45Z remove network 2024-03-01T19:04:17Z create pod network 2024-03-01T19:04:17Z create container XXXXXX 2024-03-01T19:04:17Z start container 2024-03-01T19:19:58Z stop container 2024-03-01T19:20:00Z remove container 2024-03-01T19:20:00Z remove network 2024-03-01T19:20:24Z create pod network 2024-03-01T19:20:24Z create container XXXXXXXX 2024-03-01T19:20:26Z start container 2024-03-01T19:21:05Z stop container 2024-03-01T19:21:07Z remove container 2024-03-01T19:21:07Z remove network 2024-03-01T19:21:34Z create pod network 2024-03-01T19:21:34Z create container XXXXXXXX 2024-03-01T19:21:35Z start container
3 Replies
flash-singh
flash-singh10mo ago
whats the endpoint id?
JorgeG
JorgeGOP10mo ago
1hdfqkkbw41swp Thanks for looking into it
flash-singh
flash-singh10mo ago
those logs are normal, it happens when your workers sale up and down
Want results from more Discord servers?
Add your server