Discriminated union with union type discriminator gives wrong error messages
Let's say I have this type, right here:
This will produce the error message
a must be a string (was missing), b must be a string (was missing) or c must be a string (was missing)
, however what one would expect is a must be a string (was missing)
Now interestingly this only occurs because when more then ONE of the discriminators is a union. This type here will produce the correct error message a must be a string (was missing)
The why becomes apparent when we take a look at the precompilation, in case 2 it can just switch case over the options
where in case 1 it uses the intersectionAllows
which cannot tell, that when option = "A"
the only remaining field to validate will be a
because it checks the WHOLE data instead of just looking at the option and then "locking" in.
Is this a bug? Is there some part of set theory that can justify this behaviour that i do not yet understand? 😅30 Replies
I especially don't understand why it is not possible to always use a behaviour that replicates that of the switch case where we check for the discriminator and then
"lock in" and only validate the remaining, possible variant.
This should be especially possible because discriminators are per definition disjunct
I think it just doesn't recognize
option
as the discriminator internally if it is a union type in two places. Could that be the case? Is this intentional?This falls into the category of "could be further optimized" but the more you learn about the type system, the more you learn how many cases there are like this.
This would not be a trivial change but definitely would welcome help/contributions in this area if you're looking to get into it!
The key difference for this kind of calculation vs. what you'd see about branches overlapping in terms of pure set theory is we have to compile the types to a flat switch that can be traversed in constant time, which isn't possible for all discriminants e.g. ranges of numbers or other unions like this one that might be inherently be more complex but still have some set of checks that could be made to traverse in constant time with the right extractions
It's certainly not wrong to not discriminate every case that would be theoretically possible- error messages like this aren't guaranteed as part of the API, but would be a welcome improvement!
true, compiling these unions switch statement with constant time is definitely not easy. But e.g. a range of numbers could still be checked fairly efficient by a comparison which shoudl still be O(1)?
What bothers me more is a long union of units e.g. a | b | c | d | f | g | h | i | j | k, this could be compiled to one case per option but i guess you would have to limit the length of the union at some point?
How could we limit the length of the union? Unions of more than two
===
values are always checked as a switchnot syntactically but if the switch performance degrades at a certain
N
and it is faster to leave the union undiscriminated the union could just not be discriminated if the discriminator union has more than N
options
Also i am currently thinking about implementing something, that would allow union discriminators that are strictly just unions of units because you could compile them to a switch without too many changes as far as i am aware. What do you think about that?That already exists haha
wait what 😂
Unions of more than two === values are always checked as a switchWhat I meant by
===
is unit types
i.e. either primatives or references, so checked with ===
Ah i think we might be talking about different cases here. What is currently happening is that
is compiled to
what i am talking about is that something like this:
would be compiled to:
i think this does not exist in this form yet?
Right yeah that is the optimization I was talking about originally that could be built and I would love to see more discrimination logic like that but I can imagine what would have to work for that seemingly innocuous case to work that way and it wouldn't be trivial
hmm ok :/ even if the we limit these "discriminator unions" to be just unions of units strictly? So that they can always be transformed into a case?
But realistically there's no non-pathological scenarios where that ends up resulting in some very slow check this way, especially given most people expect a union to be checked
O(N)
especialy if it doesn't have an explicit discriminant.
It's also fairly easy to construct your types to accomodate this (although implicit optimization is obviously better if possible)
Haha unfortunately once you see how this stuff works under the hood it's not quite as simple as you're hoping. It would be easy to implement heuristic to try and check for shallow literal keys but if we were going to implement this, we'd want it to be consistent with the rest of the discrimination logic- working for literals/domains, at nested levels etc.to be honest my main motivation in more powerful discrimination logic is actually the error messages, because they are much better if the union is discriminated 😅 especially if you want to transform them with onFail to apply custom error handling / messages
Actually I can imagine a better way to potentially do this than what I was thinking originally (adding
union
Disjoint logic that included individual branches).
Potentially you could a pass before discrimination where some prediscriminatedBranches
is built that would transform objects like that one into two individual branches.
Ehh maybe not, it would be hard to know how to split them up ideally because there could be many possible discriminants
So you probably would have to add the extra union disjoint logic and somehow map that back
It's definitely doable but it's defintely the sort of thing where if you haven't worked on this kind of code before it will be a steep learning curve compared to some other optimizations you might tryyeah without identifying the discriminant splitting them up before discrimination is not really feasible
The core problem is the way we identify discriminants is by looking at
===
or typeof
disjoints between branches because that directly represents switch cases
But an intersection of a union and an incompatible unit type does not create a unit disjoint, it creates a union disjoint
Because you can't say "here are the two incompatible values"
So you would have to encode that data in the union Disjoint in a way you could look for candidate union disjoints to spread back to cases of typeof
or ===
so result.type would be a union here in that case?
*entry.kind
Well result is technically an array of disjoint entries
But assuming there's just the one, then that entry would be a union disjoint
Disjoint entries have different props attached to them depending on their kind
could we check for a disjoint entry that is union kind and that that union only has branches of type "unit" ?
What you essentially would want to do is:
- check if the union disjoint is between a single literal value and a union of literal values
- check if the union disjoint is between a single domain and a union of domains
In those cases, there would be some transformation required, but you could eventually flatten them out and handle as you describe
hmm ok, that brings me to a question i have had for a long time, what exactly is a "domain" ? 😅
as far as my understanding goes it kindof describes the primitive type that can be express using typeof?
basically
typeof
but mirroring ts keywordsah ok
if we only "accept" this case for simplicity this would be relatively simple, no?
- check if the union disjoint is between a single literal value and a union of literal values
It is actually quite a bit more complex to get nested discriminants right I guess because you could also have a union disjoint where every branch has a disjoint unit type at a path but it still wouldn't be a unit branch
Ideally we'd identify those cases as well but as long as it covered unit + domain, the more naive solution would still be complete +consistent enough to include if we create an issue for the followup
when things get too far in this territory of very specific edge cases that TBH aren't all that common I don't feel great about it
There needs to be some level of consistency and cohesion unless it's some crazy hot path we're special casing
ok 👍
Domain shouldn't be much harder than unit as is- you'd essentially be flattening both the same way
does domain also include something like
number < 10
or is this another type?The disjoints is about the specific nodes that didn't overlap so domain is only the
typeof
part
You should look at @ark/schema
's readme if yoiu haven't for an overview of the structure and also the Disjoint definitions and the list of disjoint codesok i think i will just try around a bit and get familiar with the disjoints and schema types, do you have any recommendations on how to best debug / run / explore the code. Just debugging into the unit tests?
especially if i say have something like this type
and just want to explore the behaviour of intersectNodesRoot on the
option
propYou can use
.expression
to easily see what the type look likes and .json
to see something that resembles the internal structure
I would just use watch mode on some file or the test explorer or whatever you prefer