job timed out after 1 retries
I'm getting this message with a FAILED state, in roughly 10% of the jobs coming to this endpoint.
Usually this comes with a 2-3 minute delay time as well.
Where should I start looking to figure out what could be the issue here?
Job id : 2c7d3249-31e6-46e8-94bc-d07c7df38956-e1
Worker id : cpruj3ruz61wmq
Endpoint id : 74jm2u3liu0pcy
1 Reply
try add more logs for input and each step of the process:
https://docs.runpod.io/serverless/workers/handlers/handler-error-handling
Handling Errors | RunPod Documentation
Learn how to handle exceptions and implement custom error responses in your RunPod SDK handler function, including how to validate input and return customized error messages.