Query optimisation

Hey, I'm optimising some queries, and found that these 2 seemingly identical queries behave very differently in term of performance
g.V().
union(
has("account", "id", "my_account"),
has("account", "id", "my_account").
out("owns")).
union(
out("completed").values("points"),
inE("rewarded").has("claimed", true).values("points")).
sum().
next()
g.V().
union(
has("account", "id", "my_account"),
has("account", "id", "my_account").
out("owns")).
union(
out("completed").values("points"),
inE("rewarded").has("claimed", true).values("points")).
sum().
next()
vs
g.V().
has("account", "id", "my_account").
union(identity(), out("owns")).
union(
out("completed").values("points"),
inE("rewarded").has("claimed", true).values("points")).
sum().
next()
g.V().
has("account", "id", "my_account").
union(identity(), out("owns")).
union(
out("completed").values("points"),
inE("rewarded").has("claimed", true).values("points")).
sum().
next()
The 2nd query performed about 10 times faster than the first. Can anyone with experience let me know what's the different for the 2? And what I should watch out for to avoid bad performing query like the first one? Thank you.
Solution:
If would suggest looking at the profile of the query and post it here as well as what database you are using (e.g. Gremlin Server, JanusGraph, Neptune, etc.) to see where the time is being spent. Without having more information it is difficult to give specifics as to why they query is slow.
Without any additional context I would take a guess that most of the difference in time is being spent doing has("account", "id", "my_account") since the first version is doing that filter twice....
Jump to solution
3 Replies
Solution
Dave
Dave9mo ago
If would suggest looking at the profile of the query and post it here as well as what database you are using (e.g. Gremlin Server, JanusGraph, Neptune, etc.) to see where the time is being spent. Without having more information it is difficult to give specifics as to why they query is slow.
Without any additional context I would take a guess that most of the difference in time is being spent doing has("account", "id", "my_account") since the first version is doing that filter twice.
Painguin
PainguinOP9mo ago
Oh yeah good point Here's a snippet (the entire thing is too large)
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep(vertex,[]) 3043 3043 263.877 71.17
constructGraphCentricQuery 0.006
constructGraphCentricQuery 0.001
GraphCentricQuery 350.768
\_condition=()
\_orders=[]
\_isFitted=false
\_isOrdered=true
\_query=[]
scan 350.644
\_query=[]
\_fullscan=true
\_condition=VERTEX
JanusGraphMultiQueryStep 3043 3043 2.992 0.81
NoOpBarrierStep(2500) 3043 3043 3.158 0.85
UnionStep([[JanusGraphHasStep([~label.eq(accoun... 2 2 99.037 26.71
JanusGraphHasStep([~label.eq(account), addres... 1 1 57.522
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
JanusGraphStep(vertex,[]) 3043 3043 263.877 71.17
constructGraphCentricQuery 0.006
constructGraphCentricQuery 0.001
GraphCentricQuery 350.768
\_condition=()
\_orders=[]
\_isFitted=false
\_isOrdered=true
\_query=[]
scan 350.644
\_query=[]
\_fullscan=true
\_condition=VERTEX
JanusGraphMultiQueryStep 3043 3043 2.992 0.81
NoOpBarrierStep(2500) 3043 3043 3.158 0.85
UnionStep([[JanusGraphHasStep([~label.eq(accoun... 2 2 99.037 26.71
JanusGraphHasStep([~label.eq(account), addres... 1 1 57.522
look like the first query will always traverse through the entire graph I'm using janusgraph 1.0.0 is this a bug or is it expected? as in g.V().union( the V() step will always traverse through every vertices, at least it seems like it.
Bo
Bo9mo ago
I don't think it's a bug. It's just an optimization missing in JanusGraph. Look at JanusGraphStepStrategy if you'd like to improve this.
Want results from more Discord servers?
Add your server