✅ object composition: a database perspective
composition vs inheritance is abundantly discussed, but there are issues when it gets to store composited objects
i can't really seem to find many discussions about this, so maybe this is an unsolved problem (even considering implementation depends on the requisites) or i am trying to do the wrong thing (eventually not too unsurprising)
there are discussions about what structure to use for writing trees in databases, but this is kinda a specialized one: it's not really deep, but it can be pretty wide, there can be 5, 10, 20 fields for a single object
and some of them, even the most of them, could be collections
so writing every field as a table means a join for everyone, and having a query with 20 joins for an object will never happen
there are other options, for example some of those could be owned objects, but this may bring the table to have hundreds of columns, and still would not solve the collections situation
one could even think of writing both MainObject and contained data in the same table, or use a table for each, two tables could be better than 20 tables, but then maybe the models can get pretty different between db and business logic, and you always have to translate structure back and forth...
i don't know, maybe i'm too worried, maybe there are better dbs than relational ones for this (but tbh a graph db doesn't seem appropriate either) , maybe hard crafting the domains and the queries for a specific case is the only way, but still this would need to have a decent performance and a decent understandabilty, and something is off
you could have a (denormalized) field in MainObject to tell which composition has stuff, maybe it could help in sparse matrix cases, but maybe not really; atm using a single table intuitively would seem to work best, because again the issue are the joins
10 Replies
I have a rule for when to use database solutions, soecufucally, Views and Stored procs.
When I need to force a sub query for entity framework, I use a ViewTable (so I can keep using EF core on the "outer query")
Whenever recursion is involved, I use a Stored Proc.
There's no easy way to do recursion any other way for a db query, tbh
And tree hierarchies 100% go in the recursion challenge bin, so I'd use a stored proc there
i thought about sp, im not sure they would solve every case, in the sense that some would probably be better off moved to better code design, and for the others they still would not avoid joins
even using a sp i would still have to read multiple tables just to understand if there is composited data or not
i know that not everything can be linear, but it would be nice tho -- imagine everything is in one table i can can recover a full object with just
where id = #
but also im thinking what if there are 10^6 elements, how could a query with joins be fast when i have to do stuff in the database
so should i really store a recursive structure? should i have some helper fields?
probably the only way to know is testing it, but that's the weird part to me, i expected people already used thisI typically try to avoid a recursive db schema unless absolutely 100% necessary, when the "depth" of the tree truly has no upper maximum
If it's a tree like structure but it has a known maximum tree depth (IE you'll never go more than 3 layers deep), then I'd not actually make it recursive and instead just do normal joins
It's pretty rare to truly need a genuinely infinitely recursive tree structure.
Most applications from clients won't be mad if you impose a reasonable max depth restriction as a tradeoff for it being waaaaay easier to implement
yeah i would keep the main structure 2 layers or at most 3
in such a case, normal navigation properties with EF Core will very likely be plenty to get the job done
that is covered in the part where i say "having a query with 20 joins for an object will never happen"
Ive had that many joins before, it's not too bad. Joins on cluster indexes are extremely performant
if you return like the first 10 records, yes it works
but with the engine at work a query like that on a million records with where and sorting goes in timeout with even half the joins
That's usually just indicative that your indexes aren't setup right
I've done more than 20 joins on 100s of millions of records with filtering and it would run in sub 1s
Gotta use the query planner to assess where the issue is, usually it's cuz of a missing index
most of the times db in use is mysql 5.7, so not the most recent things
also these queries are for a cms, and including paging, authorization - there were two recursions in two tables - they can get pretty wild
since now i'm the one who's managing that and we have a new version of the cms i'm trying to do the possible to avoid this mess