App statistic recording strategies
Hi gang,
I'm building an app that does 'stuff'. Things in the app can be on or off. I need to create historical graphs of the totals of 'on' things over time. Simple enough. The app is built with Laravel (PHP/MySQL).
My strategy at the moment is to have a daily runner script that selects the count of rows in the tables I'm interested in where status=1 (1 being 'on'). That yields an integer which I dump in a stats table along with the current date. That stats table is then harvested for relevant data the results being passed to the graphing system (Apex) in the main app and a simple datetime graph is produced. All that works.
My issue is that the graph has a potential 24 hour latency before it's updated with correct figures. 24 hours could elapse with the graph being potentially incorrect. How is this dealt with in 'real' apps'?
As I see it, I could:
1/ Put a disclaimer on the graph saying they're updated daily at midnight. Provide the accurate number on-demand in a small panel keeping the historicals separate.
2/ Increase the frequency of the runner. This would increase the granularity of the graph but greatly increase the rows in (and load on) the database (the example I gave above is a simplified example of what I'm actually doing - there's potentially hundreds of these counters). As time rolls on, the volumes of data being lobbed about will increase exponentially.
3/ Implement some sort of RRD wizardry. Still potentially heavy on the recording side also granularity is lost over time.
4/ Something else I'm not aware of or thinking of?
5/ I've just had a thought - A bunch of the systems I use at work all have up-to-the-minute accurate graphs of the stuff they're report on. How do they do that? Do they do what I'm doing already but tack on a quick point-in-time count before the graph is displayed? That might be the way..
Well, I'll post this anyway - always keen to learn!
cheers!
e
13 Replies
I think that the answer is "it depends", as it usually is. What are your needs? Can you live with that potential 24 hour latency? If yes, then there's nothing to be done. What is the maximum latency that you're comfortable having to deal with? If you need something as close to real-time as possible, then you could consider some sort of database or caching system like Redis, which runs in-memory and is fast to read/write. In Laravel, you can trigger events after an operation has taken place in the database, that would be the ideal place to update the Redis layer.
https://laravel.com/docs/10.x/eloquent#events
thanks - needs are just to store stats like I say so I can produce a graph. I'll look into the Redis thing. Ideally, I'd like to keep it all within the Laravel/mysql realm though. Don't want to introduce extra components if I can avoid it.
Even without Redis, you could still emit an event when the database updates and store the most up-to-date result somewhere else like in a text file, or another table in the database. I'd assume that the bulk of the work for this report is querying the entire database and calculate some result. If you can avoid doing that, spreading the workload over larger periods of time you will likely make things more efficient.
I have to say though, avoiding using Redis to keep components to a minimum makes sense... but you're also not taking advantage of Laravel's potential. One of the reasons for using Laravel (any framework, really) is to leverage its features. Laravel has excellent adapters for Redis and other 3rd-party tools, and you can even use it for caching which is something that will most certainly want to implement at some point.
I think a similar situation might be faced by banks. How do they provide you with your balance on demand? Do they calculate all transactions on your account since forever, or do they maintain a running total somewhere?
My needs are nothing like those of a bank (in terms of scale!) but I do want to be as efficient as possible. Good point about frameworks. I'm slowly learning and feeling my way around this thing (loving it so far!)
im pretty sure they have a running total
makes no sense to calculate 5 years of transactions everytime you need to see your account balance
if i were you, i would have a running total and a running average
also, a running count
anytime a new record is added, increment the 3
this way, you might even be able to reduce some of the work you need to do
Banks work with ledgers that store each transaction, so there's no need to re-calculate everything from scratch. You probably want to do something similar, and the most efficient way of doing it would probably be with Redis as a form of in-memory database. Not that reading/writing to disk would be terribly slow either but it's just not as efficient. Plus, should your app ever need to scale, it'd be much easier to distribute instances of Redis across multiple servers locations than sharing a file system.
iirc banks also store the before and after balance for each transaction
so each row will have the balance before, the amount for that transaction, and the balance after
yes, they do
at least, mine does
Interesting. Thanks guys!
Why not just store when a value changes?
You can then clump together changes that happen quickly later on, can replace it with a marker that says it was changed 5 times or something
thats what i said, but was more specific on which values to store
Ah missed that
its fine, but you were right in what you said