The wording around how sampling is done
The wording around how sampling is done is unclear - which row from a sampled region gets stored? What if I'm using this to count interactions with a service, and a user spams some low-tier requests to "hide" high-tier requests if they are dropped when sampling?
There are example queries on the docs page to "account for" sampling, and others aren't really affected, but if it's more openly communicated with examples I'm sure people would understand and accept it more
3 Replies
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Sure, basically I'm interested to hear how the following scenarios would get handled:
- Index 1 has consistent ~10rps, and Index 2 overwhelms a server (basically, is it guaranteed that below 100 rps I get no sampling)
- Is the 100rps calculated as per second, or what burst rates are allowed? Is it 100/1s, 300/3s, or something else?
- Which lines would be removed if I get 1 line over the threshold to be sampled? Is it the first one, the last one, random? Potentially, the one with the lowest entropy?
- What if I have two lines sampled? Are they both from the end, or spread evenly during the sampling period?
These are the ones I could come up with so far
---
About burst rates
If bursts are handled strictly, allowing 100 lines per second, would there be some mechanism (eg. worker trigger) to handle bursts externally? Could I create a queue to spread potentially sampling-territory rates to keep it just below, and AE would pop items as it could?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View