Extremely slow Delay Time
We are using 2 serverless endpoints on runpod and the "Delay Time" (which I assume measures end to end time) varies drastically between the endpoints. They both use the same hardware (the A5000 option) and one of them has sub-second delay times and the other ~50 seconds up to 180s.
On the slow endpoint, the worst cold start time is reported as 13s, and the execution time is ~2s, which don't add up to the delay time. There are ~50 seconds unnacounted for.
The other endpoint using the same hardware does not observe such drastic delay time.
Solution:Jump to solution
Delay time is NOT end to end time. It is the cold start time + the time that your request is in the queue for before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.
5 Replies
My question would be: how is the delay time measured?
is our bad timing due to throttling, or do we not have enough workers to handle our traffic?
Solution
Delay time is NOT end to end time. It is the cold start time + the time that your request is in the queue for before a worker picks it up. Delay time can be dramatically impacted if all of your workers are throttled.
You can improve slow delay time by not using Network Storage on your endpoint, or select a GPU tier that doesn't have low availability.
I found that term delay very confusing. Shouldn't it just mean the time spent before the handler is called?
Not sure whats confusing about it, thats exactly what delay means