Discriminated union with union type discriminator gives wrong error messages

Let's say I have this type, right here:
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C' | 'D'",
b: "string"
}, {
option: "'E'",
c: "string"
})

const out = Thing({
option: 'A'
})
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C' | 'D'",
b: "string"
}, {
option: "'E'",
c: "string"
})

const out = Thing({
option: 'A'
})
This will produce the error message a must be a string (was missing), b must be a string (was missing) or c must be a string (was missing), however what one would expect is a must be a string (was missing) Now interestingly this only occurs because when more then ONE of the discriminators is a union. This type here will produce the correct error message a must be a string (was missing)
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C'",
b: "string"
}, {
option: "'E'",
c: "string"
})

const out = Thing({
option: 'A'
})
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C'",
b: "string"
}, {
option: "'E'",
c: "string"
})

const out = Thing({
option: 'A'
})
The why becomes apparent when we take a look at the precompilation, in case 2 it can just switch case over the options
union329Allows(data, ctx) { switch(data?.option) {
case "C": return this.intersection2241Allows(data)
case "E": return this.intersection2247Allows(data)
default: return this.intersection1938Allows(data)
}
return false
},
union329Allows(data, ctx) { switch(data?.option) {
case "C": return this.intersection2241Allows(data)
case "E": return this.intersection2247Allows(data)
default: return this.intersection1938Allows(data)
}
return false
},
where in case 1 it uses the intersectionAllows which cannot tell, that when option = "A" the only remaining field to validate will be a because it checks the WHOLE data instead of just looking at the option and then "locking" in.
union298Allows(data, ctx) { if (this.intersection1938Allows(data)) {
return true
}
if (this.intersection1941Allows(data)) {
return true
}
if (this.intersection1944Allows(data)) {
return true
}
return false
},
union298Allows(data, ctx) { if (this.intersection1938Allows(data)) {
return true
}
if (this.intersection1941Allows(data)) {
return true
}
if (this.intersection1944Allows(data)) {
return true
}
return false
},
Is this a bug? Is there some part of set theory that can justify this behaviour that i do not yet understand? 😅
30 Replies
Flosi21
Flosi21OP4d ago
I especially don't understand why it is not possible to always use a behaviour that replicates that of the switch case where we check for the discriminator and then "lock in" and only validate the remaining, possible variant. This should be especially possible because discriminators are per definition disjunct I think it just doesn't recognize option as the discriminator internally if it is a union type in two places. Could that be the case? Is this intentional?
ssalbdivad
ssalbdivad4d ago
This falls into the category of "could be further optimized" but the more you learn about the type system, the more you learn how many cases there are like this. This would not be a trivial change but definitely would welcome help/contributions in this area if you're looking to get into it! The key difference for this kind of calculation vs. what you'd see about branches overlapping in terms of pure set theory is we have to compile the types to a flat switch that can be traversed in constant time, which isn't possible for all discriminants e.g. ranges of numbers or other unions like this one that might be inherently be more complex but still have some set of checks that could be made to traverse in constant time with the right extractions It's certainly not wrong to not discriminate every case that would be theoretically possible- error messages like this aren't guaranteed as part of the API, but would be a welcome improvement!
Flosi21
Flosi21OP4d ago
true, compiling these unions switch statement with constant time is definitely not easy. But e.g. a range of numbers could still be checked fairly efficient by a comparison which shoudl still be O(1)? What bothers me more is a long union of units e.g. a | b | c | d | f | g | h | i | j | k, this could be compiled to one case per option but i guess you would have to limit the length of the union at some point?
ssalbdivad
ssalbdivad4d ago
How could we limit the length of the union? Unions of more than two === values are always checked as a switch
Flosi21
Flosi21OP4d ago
not syntactically but if the switch performance degrades at a certain N and it is faster to leave the union undiscriminated the union could just not be discriminated if the discriminator union has more than N options Also i am currently thinking about implementing something, that would allow union discriminators that are strictly just unions of units because you could compile them to a switch without too many changes as far as i am aware. What do you think about that?
ssalbdivad
ssalbdivad4d ago
That already exists haha
Flosi21
Flosi21OP4d ago
wait what 😂
ssalbdivad
ssalbdivad4d ago
Unions of more than two === values are always checked as a switch
What I meant by === is unit types i.e. either primatives or references, so checked with ===
Flosi21
Flosi21OP4d ago
Ah i think we might be talking about different cases here. What is currently happening is that
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C'",
b: "string"
}, {
option: "'E'",
c: "string"
})

const out = Thing({
option: 'A'
})
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C'",
b: "string"
}, {
option: "'E'",
c: "string"
})

const out = Thing({
option: 'A'
})
is compiled to
switch(option){
case 'C': assertOptionB
case 'E': assertOptionC
default: assertOptionA
}
switch(option){
case 'C': assertOptionB
case 'E': assertOptionC
default: assertOptionA
}
what i am talking about is that something like this:
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C' | 'D'",
b: "string"
}, {
option: "'E'",
c: "string"
})
const Thing = type.or({
option: "'A' | 'B'",
a: "string"
}, {
option: "'C' | 'D'",
b: "string"
}, {
option: "'E'",
c: "string"
})
would be compiled to:
switch(option){
case 'A': optionA
case 'B': optionA
case 'E': optionC
default: optionB
switch(option){
case 'A': optionA
case 'B': optionA
case 'E': optionC
default: optionB
i think this does not exist in this form yet?
ssalbdivad
ssalbdivad4d ago
Right yeah that is the optimization I was talking about originally that could be built and I would love to see more discrimination logic like that but I can imagine what would have to work for that seemingly innocuous case to work that way and it wouldn't be trivial
Flosi21
Flosi21OP4d ago
hmm ok :/ even if the we limit these "discriminator unions" to be just unions of units strictly? So that they can always be transformed into a case?
ssalbdivad
ssalbdivad4d ago
But realistically there's no non-pathological scenarios where that ends up resulting in some very slow check this way, especially given most people expect a union to be checked O(N) especialy if it doesn't have an explicit discriminant. It's also fairly easy to construct your types to accomodate this (although implicit optimization is obviously better if possible) Haha unfortunately once you see how this stuff works under the hood it's not quite as simple as you're hoping. It would be easy to implement heuristic to try and check for shallow literal keys but if we were going to implement this, we'd want it to be consistent with the rest of the discrimination logic- working for literals/domains, at nested levels etc.
Flosi21
Flosi21OP4d ago
to be honest my main motivation in more powerful discrimination logic is actually the error messages, because they are much better if the union is discriminated 😅 especially if you want to transform them with onFail to apply custom error handling / messages
ssalbdivad
ssalbdivad4d ago
Actually I can imagine a better way to potentially do this than what I was thinking originally (adding union Disjoint logic that included individual branches). Potentially you could a pass before discrimination where some prediscriminatedBranchesis built that would transform objects like that one into two individual branches. Ehh maybe not, it would be hard to know how to split them up ideally because there could be many possible discriminants So you probably would have to add the extra union disjoint logic and somehow map that back It's definitely doable but it's defintely the sort of thing where if you haven't worked on this kind of code before it will be a steep learning curve compared to some other optimizations you might try
Flosi21
Flosi21OP4d ago
yeah without identifying the discriminant splitting them up before discrimination is not really feasible
ssalbdivad
ssalbdivad4d ago
The core problem is the way we identify discriminants is by looking at === or typeof disjoints between branches because that directly represents switch cases But an intersection of a union and an incompatible unit type does not create a unit disjoint, it creates a union disjoint Because you can't say "here are the two incompatible values" So you would have to encode that data in the union Disjoint in a way you could look for candidate union disjoints to spread back to cases of typeof or ===
Flosi21
Flosi21OP4d ago
so result.type would be a union here in that case?
const result = intersectNodesRoot(l.in, r.in, l.$)
if (!(result instanceof Disjoint)) continue

for (const entry of result) {
if (!entry.kind || entry.optional) continue

let lSerialized: string
let rSerialized: string

if (entry.kind === "domain") {
const lValue = entry.l as Domain.Node | Domain.Enumerable
const rValue = entry.r as Domain.Node | Domain.Enumerable
lSerialized = `"${typeof lValue === "string" ? lValue : lValue.domain}"`
rSerialized = `"${typeof rValue === "string" ? rValue : rValue.domain}"`
} else if (entry.kind === "unit") {
lSerialized = (entry.l as Unit.Node).serializedValue
rSerialized = (entry.r as Unit.Node).serializedValue
} else continue
const result = intersectNodesRoot(l.in, r.in, l.$)
if (!(result instanceof Disjoint)) continue

for (const entry of result) {
if (!entry.kind || entry.optional) continue

let lSerialized: string
let rSerialized: string

if (entry.kind === "domain") {
const lValue = entry.l as Domain.Node | Domain.Enumerable
const rValue = entry.r as Domain.Node | Domain.Enumerable
lSerialized = `"${typeof lValue === "string" ? lValue : lValue.domain}"`
rSerialized = `"${typeof rValue === "string" ? rValue : rValue.domain}"`
} else if (entry.kind === "unit") {
lSerialized = (entry.l as Unit.Node).serializedValue
rSerialized = (entry.r as Unit.Node).serializedValue
} else continue
*entry.kind
ssalbdivad
ssalbdivad4d ago
Well result is technically an array of disjoint entries But assuming there's just the one, then that entry would be a union disjoint Disjoint entries have different props attached to them depending on their kind
Flosi21
Flosi21OP4d ago
could we check for a disjoint entry that is union kind and that that union only has branches of type "unit" ?
ssalbdivad
ssalbdivad4d ago
What you essentially would want to do is: - check if the union disjoint is between a single literal value and a union of literal values - check if the union disjoint is between a single domain and a union of domains In those cases, there would be some transformation required, but you could eventually flatten them out and handle as you describe
Flosi21
Flosi21OP4d ago
hmm ok, that brings me to a question i have had for a long time, what exactly is a "domain" ? 😅 as far as my understanding goes it kindof describes the primitive type that can be express using typeof?
ssalbdivad
ssalbdivad4d ago
basically typeof but mirroring ts keywords
Flosi21
Flosi21OP4d ago
ah ok if we only "accept" this case for simplicity this would be relatively simple, no? - check if the union disjoint is between a single literal value and a union of literal values
ssalbdivad
ssalbdivad4d ago
It is actually quite a bit more complex to get nested discriminants right I guess because you could also have a union disjoint where every branch has a disjoint unit type at a path but it still wouldn't be a unit branch Ideally we'd identify those cases as well but as long as it covered unit + domain, the more naive solution would still be complete +consistent enough to include if we create an issue for the followup when things get too far in this territory of very specific edge cases that TBH aren't all that common I don't feel great about it There needs to be some level of consistency and cohesion unless it's some crazy hot path we're special casing
Flosi21
Flosi21OP4d ago
ok 👍
ssalbdivad
ssalbdivad4d ago
Domain shouldn't be much harder than unit as is- you'd essentially be flattening both the same way
Flosi21
Flosi21OP4d ago
does domain also include something like number < 10 or is this another type?
ssalbdivad
ssalbdivad4d ago
The disjoints is about the specific nodes that didn't overlap so domain is only the typeof part You should look at @ark/schema's readme if yoiu haven't for an overview of the structure and also the Disjoint definitions and the list of disjoint codes
Flosi21
Flosi21OP4d ago
ok i think i will just try around a bit and get familiar with the disjoints and schema types, do you have any recommendations on how to best debug / run / explore the code. Just debugging into the unit tests? especially if i say have something like this type
const Thing = type.or(
{
option: "number | 'a' | 'b'",
},
{
option: "'c' | 'd'",
}
)
const Thing = type.or(
{
option: "number | 'a' | 'b'",
},
{
option: "'c' | 'd'",
}
)
and just want to explore the behaviour of intersectNodesRoot on the option prop
ssalbdivad
ssalbdivad4d ago
You can use .expression to easily see what the type look likes and .json to see something that resembles the internal structure I would just use watch mode on some file or the test explorer or whatever you prefer

Did you find this page helpful?