RunPod•12mo ago

Failed to return job results

My serverless endpoint is timing out after the client configured timeout of 30 seconds, even though the request is processed in under 10 seconds. I am using the python client (runpod==1.4.2). This is happening only on non-active workers. Below is one sample request from logs. I have submitted more details in the support request 3922

- sync-c4927049-99df-480e-89d5-c95d599653bd-u1
    - 2024-05-13T04:43:46.246143796Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Started.", "level": "INFO"}
    - 2024-05-13T04:43:54.355899018Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Failed to return job results. | 404, message='Not Found', url=URL('https://api.runpod.ai/v2/[REDACTED]/job-done/w481rezhgny06k/sync-c4927049-99df-480e-89d5-c95d599653bd-u1?gpu=NVIDIA+RTX+6000+Ada+Generation')", "level": "ERROR"}
    - 2024-05-13T04:43:54.355976289Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Finished.", "level": "INFO"}

- sync-c4927049-99df-480e-89d5-c95d599653bd-u1
    - 2024-05-13T04:43:46.246143796Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Started.", "level": "INFO"}
    - 2024-05-13T04:43:54.355899018Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Failed to return job results. | 404, message='Not Found', url=URL('https://api.runpod.ai/v2/[REDACTED]/job-done/w481rezhgny06k/sync-c4927049-99df-480e-89d5-c95d599653bd-u1?gpu=NVIDIA+RTX+6000+Ada+Generation')", "level": "ERROR"}
    - 2024-05-13T04:43:54.355976289Z {"requestId": "sync-c4927049-99df-480e-89d5-c95d599653bd-u1", "message": "Finished.", "level": "INFO"}

Solution:

this is solved. I incorrectly assumed from the docs that TTL means max delayTime to set but looks like it means delayTime + executionTime....

Jump to solution

9 Replies

Solution

guru•12mo ago

this is solved. I incorrectly assumed from the docs that TTL means max delayTime to set but looks like it means delayTime + executionTime.

digigoblin•12mo ago

TTL is how long to keep the job in the queue before it auto deletes, there shouldn't really be a reason to change the default unless you have regulatory concerns. So according to the docs your assumption is correct, but maybe the docs are wrong if you found it to be incorrect in practice.

digigoblin•12mo ago

https://docs.runpod.io/serverless/endpoints/send-requests#execution-policies

Send a request | RunPod Documentation

The method in which jobs are submitted and returned.

digigoblin•12mo ago

"TTL (Time-to-Live): Defines the maximum time a job can remain in the queue before it's automatically terminated. This parameter ensures that jobs don't stay in the queue indefinitely."

guruOP•12mo ago

yea, i thought once the job is being executed, it's no longer in the "queue".

digigoblin•12mo ago

Yeah, maybe @PatrickR needs to fix the docs if they are wrong.

guruOP•12mo ago

in architectures where there is a message broker and workers picking up jobs from queues, sometimes a "TTL" is configured to define a "max age" of the job in the queue, i thought this is similar

PatrickR•12mo ago

Based on your experience, it seems that TTL includes both the delay time and the execution time, rather than just the maximum time a job can spend in the queue before execution. Could you please confirm if this understanding is correct?

guruOP•12mo ago

Yes, that’s my understanding.

Gaming

Programming

Failed to return job results

Did you find this page helpful?