gekko
gekko
RRunPod
Created by gekko on 9/2/2024 in #⚡|serverless
Random CUDA Errors
Hello! About once every 2 weeks the following errors appear for a few hours and then it fixes itself: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. I can't decide what this phenomenon is - especially since the error disappears after a few hours and then reappears after 1-2 weeks... Can anyone help me find out what is causing this?
2 replies