(Flux) Serverless inference crashes without logs.

Hi All! I've built a FLUX inference container on Runpods serverless. It works (sometimes) but I get a lot of random failures and Runpods does not return me the error logs. E.g. this is the response: ''' { "delayTime": 151019, "error": "job timed out after 1 retries", "executionTime": 102002, "id": "64de56ee-4af2-4c64-ab84-02d4a7e81593-u1", "retries": 1, "status": "FAILED", "workerId": "1qjtmj861f1278" } ''' But no error log is reported, either in console or in the response, about what made the jobs re-try the first time. Also the timeout should be one hour but I get this message after a few minutes. I have also added a Telegram bot to log, but no exception is captured there as well. Did the machine just crash?
Have you experienced the same?
6 Replies
Madiator2011 (Work)
what version of SDK?
deepblhe
deepblheOP2w ago
You mean which version of the runpods SDK I have in the docker image?
nerdylive
nerdylive2w ago
yes
deepblhe
deepblheOP2w ago
1.7.0 Bumping it up to 1.7.4 which appers to be latest Hey I just see worker exited with exit code 0 Which is better than nothing but still not very informative
yhlong00000
yhlong000002w ago
exit code 0 means successful exection. completed without error
deepblhe
deepblheOP2w ago
Updating the SDK worked, thank you!
Want results from more Discord servers?
Add your server