superuser
superuser
RRunPod
Created by superuser on 10/1/2024 in #⚡|serverless
Serverless instances die when concurrent
Thanks all. Will try these things. I think of cuda oom or other exceptions happen my exception handler will catch it now and email/notify me with a stack trace. So let's see. If it silently dies, it maybe something else. Will report back
8 replies
RRunPod
Created by superuser on 10/1/2024 in #⚡|serverless
Serverless instances die when concurrent
Understand that it stops after completing its task. But this is premature stopping. I haven't been able to capture error messages because I was away. Trying to catch it as it happens. runpod doesn't persist logs either, so that doesn't help. The serverless instance has a side effect of state update in a database when it completes, which it does not do in these cases. Happens only occasionally. I catch exception conditions and notify myself but that doesn't happen in these failing cases. So I suspect the container is just shutdown somehow.
8 replies
RRunPod
Created by superuser on 10/1/2024 in #⚡|serverless
Serverless instances die when concurrent
There is a shared network volume but nothing is written to it.
8 replies