Job retry after successful run

My endpoint started to have retries for every request even though the first run is successful without any errors. Don't understand why that is happening. That is what I see in the logs when first run finishes, and retry starts 2024-10-10T11:51:52.937738320Z {"requestId": null, "message": "Jobs in queue: 1", "level": "INFO"} 2024-10-10T11:51:52.972812780Z {"requestId": "e5746a57-2af3-4849-84d1-b58d24480627-e1", "message": "Finished.", "level": "INFO"} 2024-10-10T11:51:52.972908181Z {"requestId": null, "message": "Jobs in progress: 1", "level": "INFO"} 2024-10-10T11:51:52.973024343Z {"requestId": "e5746a57-2af3-4849-84d1-b58d24480627-e1", "message": "Started.", "level": "INFO"}
6 Replies
vitalik
vitalik2w ago
seems like turning off flashboot solved the problem, but not sure, maybe just coincidence
Mihály
Mihály2w ago
For me, upgrading the SDK from 1.7.1 to ,1.7.2 got rid of the retries
vitalik
vitalik2w ago
thanks, i'll try
xuanyu
xuanyu2w ago
same issue how to resolve this? I am using 1.7.2 and turned off flashboot
furkan.huudle
furkan.huudle5d ago
you mentioned runpod cli sdk? because Im not using any cli, just deploy to serverless in dashboard, and I don't see any sdk selection (my container image: nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04)
Brandon
Brandon2d ago
Has anyone found a fix to the issue? I also get successful runs, but immediately after, the job retries and the worker subsequently fails:
2024-10-20T18:33:35.201959405Z {"requestId": "1b732766-f006-4825-8d71-ba4908d01a78-e1", "message": "Finished.", "level": "INFO"}
2024-10-20T18:33:35.792234482Z {"requestId": null, "message": "Jobs in queue: 1", "level": "INFO"}
2024-10-20T18:33:35.792291243Z {"requestId": null, "message": "Jobs in progress: 1", "level": "INFO"}
2024-10-20T18:33:35.806365607Z {"requestId": "1b732766-f006-4825-8d71-ba4908d01a78-e1", "message": "Started.", "level": "INFO"}
2024-10-20T18:33:35.201959405Z {"requestId": "1b732766-f006-4825-8d71-ba4908d01a78-e1", "message": "Finished.", "level": "INFO"}
2024-10-20T18:33:35.792234482Z {"requestId": null, "message": "Jobs in queue: 1", "level": "INFO"}
2024-10-20T18:33:35.792291243Z {"requestId": null, "message": "Jobs in progress: 1", "level": "INFO"}
2024-10-20T18:33:35.806365607Z {"requestId": "1b732766-f006-4825-8d71-ba4908d01a78-e1", "message": "Started.", "level": "INFO"}
2024-10-20 19:24:33.267 [cc1g0dj5wo63pu] [error] Failed to return job results. | 400, message='Bad Request'
Want results from more Discord servers?
Add your server