Should `barrier` step merge Edge properties with the same key and value?
I am trying to understand if it is expected for
barrier
to merge multiple traversers of different Edge properties together (a.k.a. optimization). Currently, as the result of such merging some Edge properties might be missing from the continuing traversal. For example, the following test will fail as the last line because a single property is still left after all "name" properties removal (graph.traversal().E().properties("name").barrier(5).drop().iterate()
). I.e. I had impression that barrier
step may influence query optimization, but not influence query result. Now I'm trying to understand if that is the intended behavior or not.
Solution:Jump to solution
barrier()
doesn't dedup. it bulks. https://tinkerpop.apache.org/docs/current/reference/#barrier-step i think the problem here is that unique Edge
properties bulk because of how equality works for them, where you can have two key/values that are the same but not refer to the same actual property. note that the same doesn't happen for vertex properties which have quality based on id
:
```gremlin> g.addV().property('name','alice')
==>v[0]
gremlin> g.addV().property('name','alice')
==>v[2]...4 Replies
This is interesting, I also wonder if it's the intended behaviour if it only occurs to Edges. I assume it's skipping duplicated Edge properties?
@spmallette thoughts?
From “dedup” step documentation - Vertex properties are first class citizens. Thus, they are compared differently than “Edge” properties or “meta” properties.
Both “Edge” and “meta” properties are compared using “key” and “value” only. Meaning two properties from different edges which have the same “key” and “value” are considered to be equal.
Looking into the code I see that “dedup” and “barrier” steps are quite similar in the way they are comparing elements / properties.
The thing is that it’s not obvious (at least from user’s perspective) that “barrier” step acts like “dedup” step. As a user I would assume that “barrier” step only triggers computation of previous traversals but doesn’t change the final query result. The test above demonstrates that “barrier” step actually changes the final query result when used with the “drop” step.
I do understand why that happens (because of the merging optimisation happening inside the barrier step). However, it’s not clear to me if this was intended behaviour or not.
In either case I believe we should either improve documentation of the “barrier” step to let users know about such behaviour or we should change the way “barrier” step works.
Solution
barrier()
doesn't dedup. it bulks. https://tinkerpop.apache.org/docs/current/reference/#barrier-step i think the problem here is that unique Edge
properties bulk because of how equality works for them, where you can have two key/values that are the same but not refer to the same actual property. note that the same doesn't happen for vertex properties which have quality based on id
:
when you add in the fact that barrier()
steps are injected by way of LazyBarrierStrategy
it's even less evident to realize what is happening. The workaround would be to remove LazyBarrierStrategy
when executing this type of query.
I think that the best fix would be to change equality rules for Property
(which also affects metaproperties), but we've resisted that because it could break a lot of existing code. It had to be saved for something like 4. x.
Off the top of my head, the best fix for 3.x would be to change LazyBarrierStrategy
to avoid putting up a barrier()
for cases where a Property
is involved. That may come with a performance impact, but the workaround be that you'd have to look at your traversal to determine if adding barrier()
makes sense and if so do it manually. Then you'd be back to where you where before the upgrade.Thank you for the detailed description!
I believe LazyBarrierStrategy is not used for any traversal which has at least one “drop” step, but I might be wrong (have to check).
Internally it looks like bulking just increments the counter of equal traversers while storing only the first property.
The problem is that we actually use barrier steps as an optimisation for batch queries and this time we want to optimise “drop” step itself.
I believe in this case our best bet would be to make additional implementation of “NoOpBarrier” step which skips edge properties and meta properties and use that step instead of the standard “NoOpBarrierStep” for our batch optimisation strategy.
The related PR: https://github.com/apache/tinkerpop/pull/2612
GitHub
Open NoOpBarrierStep for extensibility by porunov · Pull Request #2...
This will make it possible to extend and overwrite NoOpBarrierStep.
Reason for this change: JanusGraph/janusgraph#4456 (comment)
Related discussion where the conclusion was made to relax individual...