RunPod•3mo ago

job timed out after 1 retries

Hello! Getting this on every job now on 31py4h4d9ytybu endpoint on serverless. My logs have zero messages or indication about where this is happening, from the outside it looks as if the are totally paused or non-responsive. This silently hung work for over an hour. I'm on runpod 1.7.4. This is currently having significant impacts on production work, without any clear remediation (see screenshots for no logs for many many minutes despite work happening constantly, and errors on every job). Would love some help!!

3 Replies

nerdylive•3mo ago

if you want to get logs, you must print on the main python process or the process that you run from dockerfile Got any code that helps? or log? is it the kill worker & finished or the error below? seems like the error below is an input error

dgaffOP•3mo ago

@nerdylive lol so it ended up being that I needed to version bump my runpod package from 1.7.4 to 1.7.7. Very frustrated that a patch level version fixes this, like how would I have possibly ever found out unless I spent three days blaming myself for this trying to fix it then reaching out to customer service lol

nerdylive•3mo ago

Ooh what Hahah yeah you should check with runpod too to check for bugs

Gaming

Programming

job timed out after 1 retries

Did you find this page helpful?