Question on macth step

I am wondering why the following queries return the different results:
gremlin> graph = TinkerFactory.createModern() gremlin> g=traversal().withEmbedded(graph) gremlin> g.V().match(.as("a").both().as("b").both().as("c")).count() ==>30 gremlin> g.V().match(.as("a").both().as("b").both().as("c"), __.as("a").both().as("b").both().as("c")).count() ==>48
I must admit that I have not understood the "match" step, thus I am not sure whether it is a expected behavior or an issue. It would be highly appreaciated if somebody could help me investigate on gremlin
4 Replies
kelvinl2816
kelvinl2816•17mo ago
In cases like this the match step is not buying you much. For example, these two are equivalent:
gremlin> g.V().match(__.as("a").both().as("b").both().as("c")).count()
==>30

gremlin> g.V().both().both().count()
==>30
gremlin> g.V().match(__.as("a").both().as("b").both().as("c")).count()
==>30

gremlin> g.V().both().both().count()
==>30
To better see what is happening, perhaps replace the count step with a path step or add a .profile() to the end of the query. If you change the second query to not reuse all the same labels, it will perhaps make more sense.
gremlin> g.V().match(__.as("a").both().as("b").both().as("c"),
......1> __.as("a").both().as("b").both().as("e")).count()
==>174

gremlin> g.V().both().both().both().both().count()
==>174
gremlin> g.V().match(__.as("a").both().as("b").both().as("c"),
......1> __.as("a").both().as("b").both().as("e")).count()
==>174

gremlin> g.V().both().both().both().both().count()
==>174
Because (in your original second query) you are reusing the labels from the first line of the match in the second line, Gremlin is trying to honor that pattern. Note that only the start and end labels on each line really affect the results. For example:
gremlin> g.V().match(__.as('a').both().both().as('c'),
......1> __.as('a').both().both().as('c')).count()
==>48
gremlin> g.V().match(__.as('a').both().both().as('c'),
......1> __.as('a').both().both().as('c')).count()
==>48
By profiling the query that yields 48 as the count, we can see what is happening.
gremlin> g.V().match(__.as("a").both().as("b").both().as("c"),
......1> __.as("a").both().as("b").both().as("c")).count().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[]) 6 6 0.135 6.60
MatchStep(null,AND,[[MatchStartStep(a), VertexS... 48 48 1.826 89.10
MatchStartStep(a) 16 16 0.093
VertexStep(BOTH,vertex)@[b] 34 34 0.072
VertexStep(BOTH,vertex) 84 84 0.087
MatchEndStep(c) 36 36 0.109
MatchStartStep(a) 20 20 0.029
VertexStep(BOTH,vertex)@[b] 50 50 0.076
VertexStep(BOTH,vertex) 120 120 0.107
MatchEndStep(c) 42 42 0.105
CountGlobalStep 1 1 0.088 4.30
>TOTAL - - 2.049 -
gremlin> g.V().match(__.as("a").both().as("b").both().as("c"),
......1> __.as("a").both().as("b").both().as("c")).count().profile()
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
TinkerGraphStep(vertex,[]) 6 6 0.135 6.60
MatchStep(null,AND,[[MatchStartStep(a), VertexS... 48 48 1.826 89.10
MatchStartStep(a) 16 16 0.093
VertexStep(BOTH,vertex)@[b] 34 34 0.072
VertexStep(BOTH,vertex) 84 84 0.087
MatchEndStep(c) 36 36 0.109
MatchStartStep(a) 20 20 0.029
VertexStep(BOTH,vertex)@[b] 50 50 0.076
VertexStep(BOTH,vertex) 120 120 0.107
MatchEndStep(c) 42 42 0.105
CountGlobalStep 1 1 0.088 4.30
>TOTAL - - 2.049 -
Joyemang33
Joyemang33•17mo ago
Many thanks for your help! It looks like it is a little difference between Gremlin's pattern matching and Cypher's. So it seems that only the labels at the start and end will take effect permanently, right? 🥰
kelvinl2816
kelvinl2816•17mo ago
That's right. In general I don't use the match step in its current form. Earlier in Gremlin's evolution it supported queries that were harder to express in other gremlin steps. Over time, the where step has become very flexible and things that needed a match step can now be written other ways. For example, these two queries are equivalent
gremlin> g.V().has('airport','code','JFK').
......1> match(__.as('a').out().as('b'),
......2> __.not(__.as('b').out().as('a'))).
......3> select('b').
......4> by('code')

==>LCY
gremlin> g.V().has('airport','code','JFK').
......1> match(__.as('a').out().as('b'),
......2> __.not(__.as('b').out().as('a'))).
......3> select('b').
......4> by('code')

==>LCY
and
gremlin> g.V().has('airport','code','JFK').as('a').
......1> out().
......2> where(__.not(out().as('a'))).
......3> values('code')

==>LCY
gremlin> g.V().has('airport','code','JFK').as('a').
......1> out().
......2> where(__.not(out().as('a'))).
......3> values('code')

==>LCY
I find the where form much simpler to work with.
Joyemang33
Joyemang33•17mo ago
I get it! Many thanks for your help and examples provided that easy to understand!🥰 Actually I am trying to test GDBMS that adopt Gremlin as query language, especially for some functions of pattern matching. Your explanation about "match" clause and other ways to express the pattern matching undoubtedly help me a lot!
Want results from more Discord servers?
Add your server