How does tiered caching work in the CF worker other than enabling it?

This is what I understand in terms of control Workers have in-memory cache of x mb's, so you can define a memory cache via new Map for example. Workers can use Cache API via caches.default.put Both of these exist locally via colocation, so in-memory cache is going to be fastest, and then colo cache is second fastest in terms of latency for data retrieval.
If your data does not exist in both of those, tiered caching comes into play, and from my understanding its tied to the fetch api. What does this mean exactly? Would love a code example of the breakdown for:
1. check in-memory cache (db response)
2. check local cache (db respoonse)
3. check tiered cache (db response)
4. if non of those have the data needed, it would fetch data from a database (origin)
1. check in-memory cache (db response)
2. check local cache (db respoonse)
3. check tiered cache (db response)
4. if non of those have the data needed, it would fetch data from a database (origin)
7 Replies
Chaika
Chaika•12mo ago
👋
Both of these exist locally via colocation, so in-memory cache is going to be fastest, and then colo cache is second fastest in terms of latency for data retrieval.
Workers have 128 MB of memory they can use. This isn't per colocation, or even per machine, it's per script isolate. I believe in high load instances there could even be mutiple of those per cloudflare machine, so yea way more less likely to hit. When you connect to a worker/CF Website, you establish a connection that you reuse, which will hit the same machine until the connection ends, so you might see it hitting for a bit if you were testing from just your PC. Cache is per colocation Tiered Caching is only available via the fetch api You might be overthinking this though. If you want just a simple cache, use cache api. If you want anything longer lived, like caching something globally that's expensive, you could consider KV. If it's ok if its slow/you want strong consistents/something cheaper, use R2. https://developers.cloudflare.com/workers/platform/storage-options/ It may be worth pointing out as well these are kind of the Worker abstractions. The actual CDN just checks colo cache and then tiered cache if it doesn't hit, which you would also achieve by just doing a fetch(), and fetch would also get the origin if cache misses entirely.
Lu
LuOP•12mo ago
Gotcha. I'm trying to figure out global latency reduction for one of my routes (in regards to script isolate), i am assuming each route (hono usage as an example) is the 'script'? Cached values consist of some data from the database. Currently I have D1 setup and planetscale (PS is currently outperforming D1 for P75-P99, P99 being a 300ms difference, others around 100-200ms) So when a request comes in for data, i check in-memory cache first, if the worker doesnt have it, i check local cache (Cache API), and then finally if both dont have it i fetch from the DB, then store that result in workers in-memory cache as well ass local cache (cache api) I am not sure if I'm over thinking it or not, but the only sub-request I do within that route is to fetch from either D1 or PS (i have two different routes to test performance) I was thinking tiered cache would lower the latency by checking upper and lower tiers for data if colocated cache does not exist (assuming these upper and lower tiers are datacenters nearby that might have it) seems like tiered cache is done automatically via the fetch api, just not sure how to check if its using tiered cache or not
Chaika
Chaika•12mo ago
i am assuming each route (hono usage as an example) is the 'script'?
The script is your entire worker, not each route
Currently I have D1 setup and planetscale (PS is currently outperforming D1 for P75-P99, P99 being a 300ms difference, others around 100-200ms)
D1 will eventually have read replicas which would be really nice and might even render this pointless, not yet though
So when a request comes in for data, i check in-memory cache first
I can't imagine the hit ratio for that is very high other then for clients who already connected. Also you may eventually run into issues with putting too much into memory and causing the request to fail/isolate to reset
I was thinking tiered cache would lower the latency by checking upper and lower tiers for data if colocated cache does not exist (assuming these upper and lower tiers are datacenters nearby that might have it)
The only tiered caching toplogy you get for free is Smart Tiered Caching which selects a location nearest the origin to check. You're talking about Generic and/or Regional topology which is Enterprise only If clients do a lot of the same queries multiple times in the same connection maybe some sort of local in-memory stuff would be helpful? Otherwise I think just using Cache API is your best bet there If you want this query to be cached for a while, like ~60s and insulate your DB from requests you could shove it into KV with an expirationTTL. KV is a bit pricey though and slow on cold hits. KV 2.0 is supposed to be just like the Tiered Caching upper/lower you were thinking of, but it was rolled back for now sadly and it'll be a bit till we see it again I imagine
Chaika
Chaika•12mo ago
I think I explained the "script isolate" kind of poorly sorry, there's a doc here that goes over how Workers works: https://developers.cloudflare.com/workers/learning/how-workers-works/ When you do npx wrangler deploy, wrangler uses ESBuild to bundle everything into a single JS file/script which gets uploaded, which is your "Workers Script". On the edge, when a request is being handled by a specific cloudflare machine/metal, it runs V8 - the same Javascript Engine as Chrome. It spins up an Isolate, a "context" which runs your user code. Chrome uses isolates per chrome tab, for example. That isolate is just for your script, they are spun up when needed and evicted based on use/resource limits, etc. Each isolate has its own memory. Your in-memory cache is tied to that isolate, on that machine, in that colocation. A user with an active connection to your site (which the browser manages) will have all its requests for that connection handled by the same metal and likely isolate until the connection dies. Any other users or new connections will be randomly routed to a new machine again, which would spin up its own isolate (if not already one there) with its own memory
How Workers works · Cloudflare Workers docs
The difference between the Workers runtime versus traditional browsers and Node.js.
Lu
LuOP•12mo ago
Thanks for the doc (re-read it again) and explanation. So… isolates are essentially instances of the bundled worker script. CF spins up as many workers (bundled script) as needed and each one is tied to its own memory limits? Each isolate will store different memory KV pairs (if you globally implement a worker cache through new Map as an example) Users may or may not request from the same isolate. If an isolate is evicted (I’m assuming when that worker has not received requests in a while) everything dies with it? Including that memory cache?
Chaika
Chaika•12mo ago
That sounds about right. There's a few reasons Isolates can be evicted that are listed on there. The point just being that script in-memory objects aren't per colocation and are short lived, plus more difficult to manage since if you're not careful you could overuse the memory which would error and reset the isolate
Lu
LuOP•12mo ago
Gotcha, so I think I was overthinking it a bit. I think in my case memory and local caches is the only thing I need, and the rest is caching actual DB queries (through planet scale or hooking up hyperdrive) and then obviously the most important is DB read replicas
Want results from more Discord servers?
Add your server