Is it normal for the first request to a

Is it normal for the first request to a Worker to have a lower latency (~200ms) but subsequent requests are 2-3x as fast? I just implemented a KV store layer on top of a public API that is hit via XHR requests from various websites and while testing with production network traffic I am actually noticing the response time to be higher than anticipated.
15 Replies
Derek Cavaliero
Derek CavalieroOP•3y ago
Also notice this same behavior when hitting the worker route via Postman vs comparing it to hitting the API directly (bypassing the worker)
zegevlier
zegevlier•3y ago
~200ms is likely due to KV. KV stores the data in two central location, one the EU and one in the US. Once a key is requested, it is cached for at least 60 seconds, sometimes longer if you allow it, in whatever colo it was requested in.
Derek Cavaliero
Derek CavalieroOP•3y ago
ah i thought the KV was distributed across more than 2 locations via the global CDN
zegevlier
zegevlier•3y ago
If you're on the free plan workers also need to start up, but that's a couple of ms That's only if it's requested recently, not in general 🙂 If your data is fairly static, you could increase the cache TTL to make it stay at individual colos longer
Derek Cavaliero
Derek CavalieroOP•3y ago
yeah we're on a paid plan so startup time shouldn't be an issue I actually don't have my keys set to expire at all automatically due to the way the system works, I have some bulk purge logic written into the application to dump any stale KV entries if particular database models are changed.
zegevlier
zegevlier•3y ago
Cache TTL is different from expiry, https://developers.cloudflare.com/workers/runtime-apis/kv/#cache-ttl
It defines the length of time in seconds that a KV result is cached in the edge location that it is accessed from. This can be useful for reducing cold read latency on keys that are read relatively infrequently,
Derek Cavaliero
Derek CavalieroOP•3y ago
Ah - very good to know thank you. One final related question, if I set my Cache TTL to say 3600 and then I purge a key from the namespace before that 3600 elapses - does the API purge the cache too or does that only happen naturally. I am using https://api.cloudflare.com/#workers-kv-namespace-delete-multiple-key-value-pairs to purge the KV as needed at the moment.
zegevlier
zegevlier•3y ago
The cache will stay until it gets deleted naturally, similarly it will also not update until the cache expires naturally
Derek Cavaliero
Derek CavalieroOP•3y ago
Is there a method/endpoint to dump the cache via API?
zegevlier
zegevlier•3y ago
There is not, but you can request the data again (in the same colo) with a different cache TTL and it will use the updated one. If it helps, changes are immediately visible at the location they're made. On https://developers.cloudflare.com/workers/learning/how-kv-works/ it also mentioned that
While reads are periodically revalidated in the background
but it makes no indication or guarantee as to how often that is
e111g
e111g•3y ago
In my testing, adding Workers into the network path costs ~50ms in latency. In other words, very roughly, if my hitting my API directly typically takes ~100ms, routing the same requests through a worker (as a reverse proxy) will increase that to ~150ms (end-to-end).
zegevlier
zegevlier•3y ago
Does the path without worker include being proxied by cloudflare?
e111g
e111g•3y ago
No, the two paths are - 1. Internet -> Worker -> Cloudflare Tunnel -> [my server] 2. Internet -> AWS Global Accelerator -> AWS ALB -> [my server] So not exactly apples-to-apples, but useful for my case of "what if we add workers". The additional network hop/cost is well worth it for the benefits, but it ain't free. But you're right that a more isolated measure would be to compare "Internet -> Cloudflare -> (Worker | Direct to Origin) -> [my backend]"
zegevlier
zegevlier•3y ago
I would imagine the difference would me minuscule if the record is already proxied by cloudflare, and all you do is add a worker. Still good to know though. 🙂
e111g
e111g•3y ago
I would expect that too, but my gut tells me it'll still be closer to 50ms than to 0ms. This gives me an idea to try to set up a fair benchmark, but it's tricky to do correctly due t how Workers limit to 6 concurrent connections so a naive benchmark might overwhelm that limit if it's re-using the same TCP connection.

Did you find this page helpful?