(Flux) Serverless inference crashes without logs.
Hi All!
I've built a FLUX inference container on Runpods serverless.
It works (sometimes) but I get a lot of random failures and Runpods does not return me the error logs.
E.g. this is the response:
'''
{
"delayTime": 151019,
"error": "job timed out after 1 retries",
"executionTime": 102002,
"id": "64de56ee-4af2-4c64-ab84-02d4a7e81593-u1",
"retries": 1,
"status": "FAILED",
"workerId": "1qjtmj861f1278"
}
'''
But no error log is reported, either in console or in the response, about what made the jobs re-try the first time.
Also the timeout should be one hour but I get this message after a few minutes.
I have also added a Telegram bot to log, but no exception is captured there as well. Did the machine just crash?
Have you experienced the same?
Have you experienced the same?
6 Replies
what version of SDK?
You mean which version of the runpods SDK I have in the docker image?
yes
1.7.0
Bumping it up to 1.7.4 which appers to be latest
Hey I just see worker exited with exit code 0
Which is better than nothing but still not very informative
exit code 0 means successful exection. completed without error
Updating the SDK worked, thank you!