Should `barrier` step merge Edge properties with the same key and value?

I am trying to understand if it is expected for barrier to merge multiple traversers of different Edge properties together (a.k.a. optimization). Currently, as the result of such merging some Edge properties might be missing from the continuing traversal. For example, the following test will fail as the last line because a single property is still left after all "name" properties removal (graph.traversal().E().properties("name").barrier(5).drop().iterate()). I.e. I had impression that barrier step may influence query optimization, but not influence query result. Now I'm trying to understand if that is the intended behavior or not.
@Test
public void testDropsEdgePropertiesTinkerGraph() {
Graph graph = TinkerGraph.open();
Vertex vertex1 = graph.addVertex();
Vertex vertex2 = graph.addVertex();
vertex1.addEdge("relate", vertex2).property("name", "test");
vertex1.addEdge("relate", vertex2).property("name", "test");

assertEquals(1, graph.traversal().E().properties("name").dedup().count().next());
graph.traversal().E().properties("name").barrier(5).drop().iterate(); // should this line remove 2 properties or a single property?
assertEquals(0, graph.traversal().E().properties("name").count().next()); // fails because 1 name property is still there.
}
@Test
public void testDropsEdgePropertiesTinkerGraph() {
Graph graph = TinkerGraph.open();
Vertex vertex1 = graph.addVertex();
Vertex vertex2 = graph.addVertex();
vertex1.addEdge("relate", vertex2).property("name", "test");
vertex1.addEdge("relate", vertex2).property("name", "test");

assertEquals(1, graph.traversal().E().properties("name").dedup().count().next());
graph.traversal().E().properties("name").barrier(5).drop().iterate(); // should this line remove 2 properties or a single property?
assertEquals(0, graph.traversal().E().properties("name").count().next()); // fails because 1 name property is still there.
}
Solution:
barrier() doesn't dedup. it bulks. https://tinkerpop.apache.org/docs/current/reference/#barrier-step i think the problem here is that unique Edge properties bulk because of how equality works for them, where you can have two key/values that are the same but not refer to the same actual property. note that the same doesn't happen for vertex properties which have quality based on id: ```gremlin> g.addV().property('name','alice') ==>v[0] gremlin> g.addV().property('name','alice') ==>v[2]...
Jump to solution
4 Replies
Yang Xia
Yang Xia9mo ago
This is interesting, I also wonder if it's the intended behaviour if it only occurs to Edges. I assume it's skipping duplicated Edge properties? @spmallette thoughts?
porunov
porunovOP9mo ago
From “dedup” step documentation - Vertex properties are first class citizens. Thus, they are compared differently than “Edge” properties or “meta” properties. Both “Edge” and “meta” properties are compared using “key” and “value” only. Meaning two properties from different edges which have the same “key” and “value” are considered to be equal. Looking into the code I see that “dedup” and “barrier” steps are quite similar in the way they are comparing elements / properties. The thing is that it’s not obvious (at least from user’s perspective) that “barrier” step acts like “dedup” step. As a user I would assume that “barrier” step only triggers computation of previous traversals but doesn’t change the final query result. The test above demonstrates that “barrier” step actually changes the final query result when used with the “drop” step. I do understand why that happens (because of the merging optimisation happening inside the barrier step). However, it’s not clear to me if this was intended behaviour or not. In either case I believe we should either improve documentation of the “barrier” step to let users know about such behaviour or we should change the way “barrier” step works.
Solution
spmallette
spmallette9mo ago
barrier() doesn't dedup. it bulks. https://tinkerpop.apache.org/docs/current/reference/#barrier-step i think the problem here is that unique Edge properties bulk because of how equality works for them, where you can have two key/values that are the same but not refer to the same actual property. note that the same doesn't happen for vertex properties which have quality based on id:
gremlin> g.addV().property('name','alice')
==>v[0]
gremlin> g.addV().property('name','alice')
==>v[2]
gremlin> g.V().properties("name").barrier(5).drop()
gremlin> g.V().properties('name').count()
==>0
gremlin> g.addV().property('name','alice')
==>v[0]
gremlin> g.addV().property('name','alice')
==>v[2]
gremlin> g.V().properties("name").barrier(5).drop()
gremlin> g.V().properties('name').count()
==>0
when you add in the fact that barrier() steps are injected by way of LazyBarrierStrategy it's even less evident to realize what is happening. The workaround would be to remove LazyBarrierStrategy when executing this type of query. I think that the best fix would be to change equality rules for Property (which also affects metaproperties), but we've resisted that because it could break a lot of existing code. It had to be saved for something like 4. x. Off the top of my head, the best fix for 3.x would be to change LazyBarrierStrategy to avoid putting up a barrier() for cases where a Property is involved. That may come with a performance impact, but the workaround be that you'd have to look at your traversal to determine if adding barrier() makes sense and if so do it manually. Then you'd be back to where you where before the upgrade.
porunov
porunovOP9mo ago
Thank you for the detailed description! I believe LazyBarrierStrategy is not used for any traversal which has at least one “drop” step, but I might be wrong (have to check). Internally it looks like bulking just increments the counter of equal traversers while storing only the first property. The problem is that we actually use barrier steps as an optimisation for batch queries and this time we want to optimise “drop” step itself. I believe in this case our best bet would be to make additional implementation of “NoOpBarrier” step which skips edge properties and meta properties and use that step instead of the standard “NoOpBarrierStep” for our batch optimisation strategy. The related PR: https://github.com/apache/tinkerpop/pull/2612
GitHub
Open NoOpBarrierStep for extensibility by porunov · Pull Request #2...
This will make it possible to extend and overwrite NoOpBarrierStep. Reason for this change: JanusGraph/janusgraph#4456 (comment) Related discussion where the conclusion was made to relax individual...

Did you find this page helpful?