What storage model should I use to validate that an API key is only used X times each month?

I want to vend API keys to users who can call the API e.g. 100 times each month. I'm considering using KV namespaces with $API_KEY=$BUDGET and decrementing the budget by 1 each call, but I'm afraid of the cost of reading and writing to KV for every API call. To improve performance and cost, I'm considering writing API calls to an Analytics Engine dataset, then using cron (e.g. every 5 minutes) to read from the dataset and update the KV namespace accordingly. Additionally to improve performance and cost, I'm considering duplicating the KV data to the Cache API and using that as long as its within e.g. 5 minutes old. This assumes that the inaccuracy within 5 minutes is cheaper than the KV reads and writes would have been. Am I thinking about this right? Or is there a simpler solution? User story: Given the API key abcdefg, when you call /my-api/?key=abcdefg the first 100 times, you get an OK response code. When you call the API the 101st time, you get an error response. I understand this question is very subjective and could be solved in a lot of different ways. I'd really just like help from someone smarter than me or who has solved this problem before.
5 Replies
Chaika
Chaika•3w ago
I'm considering using KV namespaces with $API_KEY=$BUDGET and decrementing the budget by 1 each call, but I'm afraid of the cost of reading and writing to KV for every API call.
Not a good idea. KV is eventually consistent, cached with a min of 60s. So for example if the API key was used in two locations, both locations could be decrementing it 10 times within the same minute and you'd end up with 90, not 80.
I'm considering writing API calls to an Analytics Engine dataset, then using cron (e.g. every 5 minutes) to read from the dataset and update the KV namespace accordingly.
Would mean a slow shutdown but if that's fine with you, a better idea
Additionally to improve performance and cost, I'm considering duplicating the KV data to the Cache API and using that as long as its within e.g. 5 minutes old. This assumes that the inaccuracy within 5 minutes is cheaper than the KV reads and writes would have been.
https://flareutils.pages.dev/betterkv/old already exists! You can indeed save a fair bit
User story: Given the API key abcdefg, when you call /my-api/?key=abcdefg the first 100 times, you get an OK response code. When you call the API the 101st time, you get an error response.
None of what you said so far would work perfectly on the 101st time/all would have delay
Am I thinking about this right? Or is there a simpler solution?
Durable Object per user storing the count. Durable Objects exist in one location and have persistent durable storage, 5x cheaper then KV for writes, ~2.5x cheaper for reads. Only real downside is they are in one location (although that's needed for consistency), so if an API Key is used globally it could be a bit slow, and Durable Objects are just single-threaded isolates under the hood so there's a peak requests per second they can handle. It really depends on how expensive the work per request is, but the soft limit is 1,000 requests per second. Cloudflare Workers do have a rate limiting binding: https://developers.cloudflare.com/workers/runtime-apis/bindings/rate-limit/, but it's only stored per colo/cloudflare location and with currently restricted time periods. If you wanted to do something faster globally/eventually consistent, could use a Durable Object to keep track (ctx.waitUntil) writing to KV on shutdown of a user and then eventually clearing it. KV would be in the hotpath/blocking requests but not the DO. Really just depends on your requirements. Does it have to be an instant shutdown or could it be eventual? Is latency super important or not so? Is each user global or from a specific area? If you have super strict requirements (ex: latency critical, instant shutdown, global), then at some point it makes more sense to just do it on your origin, already there with a database connection on each request anyway, and then if you really wanted to, you could propagate the shutdown to KV (or have other more loose global api rate limits like Discord does for safety) to try to lower origin load.
quisi.do
quisi.do•3w ago
>Does it have to be an instant shutdown or could it be eventual? Right now, I'm thinking each API call would be so cheap (fractions of a penny) that letting an API run over-the-limit is cheaper than enforcing accuracy. i.e. for a 100-call limit, I think it may be cheaper to just let them call it 110 times than enforce 100 strictly. >Is latency super important or not so? I think latency would be important for reading the budget to determine if the API call is allowed or not. Since this would happen before every call, I think it would really add up to skimp here. It's not a "this API needs to response within 100ms" business requirement, so there's flexibility if the cost justifies it. I just fear the scalability of making the API hang on budget-lookup if other latency problems are introduced later. >Is each user global or from a specific area? Global.
Chaika
Chaika•3w ago
hmm yea, your idea with writing to Analytics Engine and then CRON pulling out to KV for blocks/etc would fit that. The only meh thing about KV is that it has two central stores (EU, US) and then its cached. Meaning higher latency in APAC/South America/Africa, and benefits most where you can aggressively cache if it helps to break it down: R2/D1/DOs all run on Durable Objects which get placed in a specific region (by default, closest to user, or the locationHint you provide): https://where.durableobjects.live/ KV has two central locations (US, EU) and all pull from it: https://developers.cloudflare.com/kv/reference/how-kv-works/ Analayitcs Engine Lives in West US, not that its latency matters too much. The datapoints you add are pushed lazily after the request is finished.
quisi.do
quisi.do•3w ago
Do you have a summary for what the best choice is to read an API key's budget every request? Between cheapest and fastest? Is going to be simply "if you want lower latency, you need to pay more per request"? One downside to durable objects is that I'm not on a paid plan yet. 😢
Chaika
Chaika•3w ago
there's no perfect solution, just ones with different trade offs. Fwiw unkey (a platform for verifying api keys) has budgets like what you'e looking for with options of fast/not consistent and slow/globally consistent and is entirely built on Cloudflare and Durable Objects, might be an interesting to look over: https://github.com/unkeyed/unkey/blob/main/apps/api/src/pkg/ratelimit/durable_object.ts The simplest solution and fastest (and pretty cheap too) is probably just Durable Objects. The issue is you said each user is "global" and not necessarily from a specific area, Durable Objects only exist in one region so requests from other regions could be rather slow AE (Analytics Engine) + KV isn't a bad idea either, KV can just be a bit slow overall, espec with a low cache ttl like you want and in some locations like the Asia Pacific region. That part depends more on your audience though. Unkey's fast mode seems to leverage a Durable Object per location syncing back to an external db and pulling down without blocking the request. Would be a lot of setup, but could help with some requirements/ensuring requests are always both fast and in sync globally