Does Gremlin do DFS or BFS?
Does Gremlin perform a Depth First Search (DFS) or Breadth First Search (BFS) when traversing?
3 Replies
While it might seem like a simple question, this question requires a bit of explanation. The starting answer is that Gremlin is a mixed BFS/DFS system given
barrier()
injection by LazyBarrierStrategy
to achieve bulking optimizations. There is some reading to do in the links I've supplied below. That said, seach behavior is also graph system dependent. For example, if you are doing Gremlin in gremlin-spark
and OLAP then it is BFS and OLTP on a system like Neptune which lets you specify the search behavior explicitly with a query hint. Here is some additional reading on the subject:
[1] https://tinkerpop.apache.org/docs/current/reference/#barrier-step
[2] https://tinkerpop.apache.org/docs/current/reference/#a-note-on-barrier-steps
You will want to understand how to use profile()
and explain()
to properly see what your traversal is doing on a particular system to ensure it is behaving as you wish it to.Thanks for the detailed answer @spmallette
I went through the links you provided and seems like a bit more research is needed from my side to understand the behavior of the query in my case. 🙂
@spmallette Thanks for explanation! I’m trying to optimise repeat step in JanusGraph.
I was struggling with understanding in what order it’s traversed. I will experiment a bit more next week to understand it, but currently my testing (which could be wrong) showed that even so it’s BFS, the ordering of traversing children steps is strange.
I would expect returned elements on the same level should be traversed in the same order, but sometimes it looks like it’s isn’t the case and the newer returned children could be traversed earlier than the older returned children.
Again, that could be a bug in my testing, but so far I can’t say in what order will the next level be traversed.
After checking it again I found flows in the tests I did. Fixing them I can now confirm that BFS works in correct ordering for repeat step (i.e. oldest accessed children are traversed first on the next loop). Sorry if I confused anyone with the previous question.