r1
r1
RRunPod
Created by r1 on 1/3/2024 in #⚡|serverless
How to retire a worker and retry its job?
We're noticing that every so often, a worker gets corrupted, and doesn't produce correct output. It's easy enough for us to detect it inside the handler when it happens. Is there a built-in way to tell runpod the job failed, the worker is bad, and it should be refreshed and requeued? Or should I do this manually with "refresh_worker" and use the API to requeue?
14 replies
RRunPod
Created by r1 on 12/21/2023 in #⚡|serverless
serverless: any way to figure out what gpu type a job ran on?
trying to get data on speeds across gpu types for our jobs, and i'm wondering if the api exposes this anywhere, and if not, what the best way to sort it out would be.
22 replies
RRunPod
Created by r1 on 12/19/2023 in #⚡|serverless
when will the status endpoint for a serverless function return 429s?
(see title)
2 replies