How are CPU limits enforced in the free tier?

Dear support team, I'd like to know how are CPU time limits enforced, particularly in the free tier. Empirically, what I'm seeing is an endpoint I am benchmarking that is responding successfully despite taking on p50 96.4ms In previous discussions [1], it was stated that there is some leeway with the limit, but still the limits and when they are enforced are unpredictable based on the information I found both on the website and the community discord. On the other hand, I found [2] from 2018, in which @kenton describes how CPU limiting works in practise, but I don't see the answer matching the data I'm getting empirically. Can I understand better when to expect a worker exceeding CPU limit of 10ms on P50 to be rate-limited? Additionally, why median and p50 values are reported differently? (100ms, and 96.4ms respectively) Thank you, [1] https://discord.com/channels/595317990191398933/1128739572898091008/1128739986318041180 [2] https://community.cloudflare.com/t/how-is-cpu-time-per-request-measured-in-cloudflare-workers/49964/3
No description
No description
4 Replies
Chaika
Chaika11mo ago
That is weird that the median and the p50 are different. I think you just don't really have enough requests to trigger it. Exceed it more, with more requests, and it should snap down on you. There are certain exceptions too, for example on startup you get an extra 400ms for startup stuff on the first execution of that worker
miguelff
miguelffOP11mo ago
Thanks @Chaika , I am aware that eventually, the limit is triggered and I'm blocked, but that's not my question. In the link [2] above @kenton describes how are limits applied, and the process does not seem to behave as explained. I'd like to understand how, as exactly as possible, the limits are enforced in order to have answers to questions like the following.
Can I safely have a worker on the free tier that on p50 and p75 consistently runs under 10ms, but on p99 is above the limit?
This is important for me to know, as if that's the case, I can have certain confidence on the availability of said worker.
kenton
kenton11mo ago
Each isolate has a "rollover bank" of CPU time. If you use less than your limit on one request, the leftover time is added to the bank. If you use more than your limit on another request, time is taken out of the bank. Only when the bank reaches zero do you actually get an error. Additionally, the bank does not start out empty. When an isolate first starts up, its bank is initialized with some extra time. This is meant to make up for the fact that the first few requests are likely to run slower as the isolate "warms up" (e.g. JIT-compiling code). Actually, we probably give isolates too much initial time. There was a bug some years ago where we gave too much initial time by accident, and then it was hard to fix the bug without breaking people who came to rely on it. We kind of just said, ok whatever, and left it. Because of the initial time, if your worker commonly executes only a small number of requests per isolate, it is indeed possible to exceed the limit regularly. But if you get enough traffic that individual isolates are commonly handling many requests, you'll start to see errors.
miguelff
miguelffOP10mo ago
Thanks @kenton, this is much clearer and makes sense with the behavior I'm seeing through experimentation.
Want results from more Discord servers?
Add your server