R
RunPod2mo ago
guru

Failed to return job results

My serverless endpoint is timing out after the client configured timeout of 30 seconds, even though the request is processed in under 10 seconds. I am using the python client (runpod==1.4.2). This is happening only on non-active workers. Below is one sample request from logs. I have submitted more details in the support request 3922
- sync-c4927049-99df-480e-89d5-c95d599653bd-u1
- 2024-05-13T04:43:46.246143796Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Started.", "level": "INFO"}
- 2024-05-13T04:43:54.355899018Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Failed to return job results. | 404, message='Not Found', url=URL('https://api.runpod.ai/v2/[REDACTED]/job-done/w481rezhgny06k/sync-c4927049-99df-480e-89d5-c95d599653bd-u1?gpu=NVIDIA+RTX+6000+Ada+Generation')", "level": "ERROR"}
- 2024-05-13T04:43:54.355976289Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Finished.", "level": "INFO"}
- sync-c4927049-99df-480e-89d5-c95d599653bd-u1
- 2024-05-13T04:43:46.246143796Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Started.", "level": "INFO"}
- 2024-05-13T04:43:54.355899018Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Failed to return job results. | 404, message='Not Found', url=URL('https://api.runpod.ai/v2/[REDACTED]/job-done/w481rezhgny06k/sync-c4927049-99df-480e-89d5-c95d599653bd-u1?gpu=NVIDIA+RTX+6000+Ada+Generation')", "level": "ERROR"}
- 2024-05-13T04:43:54.355976289Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Finished.", "level": "INFO"}
Solution:
this is solved. I incorrectly assumed from the docs that TTL means max delayTime to set but looks like it means delayTime + executionTime....
Jump to solution
9 Replies
Solution
guru
guru2mo ago
this is solved. I incorrectly assumed from the docs that TTL means max delayTime to set but looks like it means delayTime + executionTime.
digigoblin
digigoblin2mo ago
TTL is how long to keep the job in the queue before it auto deletes, there shouldn't really be a reason to change the default unless you have regulatory concerns. So according to the docs your assumption is correct, but maybe the docs are wrong if you found it to be incorrect in practice.
digigoblin
digigoblin2mo ago
"TTL (Time-to-Live): Defines the maximum time a job can remain in the queue before it's automatically terminated. This parameter ensures that jobs don't stay in the queue indefinitely."
guru
guru2mo ago
yea, i thought once the job is being executed, it's no longer in the "queue".
digigoblin
digigoblin2mo ago
Yeah, maybe @PatrickR needs to fix the docs if they are wrong.
guru
guru2mo ago
in architectures where there is a message broker and workers picking up jobs from queues, sometimes a "TTL" is configured to define a "max age" of the job in the queue, i thought this is similar
PatrickR
PatrickR2mo ago
Based on your experience, it seems that TTL includes both the delay time and the execution time, rather than just the maximum time a job can spend in the queue before execution. Could you please confirm if this understanding is correct?
guru
guru2mo ago
Yes, that’s my understanding.