R
RunPod6mo ago
wmute

Extremely slow Delay Time

We are using 2 serverless endpoints on runpod and the "Delay Time" (which I assume measures end to end time) varies drastically between the endpoints. They both use the same hardware (the A5000 option) and one of them has sub-second delay times and the other ~50 seconds up to 180s. On the slow endpoint, the worst cold start time is reported as 13s, and the execution time is ~2s, which don't add up to the delay time. There are ~50 seconds unnacounted for. The other endpoint using the same hardware does not observe such drastic delay time.
Solution:
Delay time is NOT end to end time. It is the cold start time + the time that your request is in the queue for before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.
Jump to solution
5 Replies
wmute
wmute6mo ago
My question would be: how is the delay time measured? is our bad timing due to throttling, or do we not have enough workers to handle our traffic?
Solution
ashleyk
ashleyk6mo ago
Delay time is NOT end to end time. It is the cold start time + the time that your request is in the queue for before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.
ashleyk
ashleyk6mo ago
You can improve slow delay time by not using Network Storage on your endpoint, or select a GPU tier that doesn't have low availability.
ssssteven
ssssteven6mo ago
I found that term delay very confusing. Shouldn't it just mean the time spent before the handler is called?
ashleyk
ashleyk6mo ago
Not sure whats confusing about it, thats exactly what delay means