RepeatStep does not appear to respect barriers

I was digging into some traversal performance and had something similar to the following:
g.V(<ids>).repeat(out()).until(out().count().is(0)).toList()
g.V(<ids>).repeat(out()).until(out().count().is(0)).toList()
For the graph implementation in question, out() is implemented on top of a CollectingBarrierStep. I noticed that that fact is not respected by the repeat and it only gets 1 item at a time, i.e no aggregation. I removed my strategy and changed the query to:
g.V(<ids>).repeat(barrier(10).out()).times(2).toList()
g.V(<ids>).repeat(barrier(10).out()).times(2).toList()
and then put breakpoints in the FlatMapStep and it seemed that the barrier was still not respected and items came in 1 at a time. Fully verifying this is an issue is difficult and I'd be surprised that this was not noticed before. Has anyone noticed anything similar or have reason to believe this is not the case?
7 Replies
spmallette
spmallette16mo ago
I'm not sure I can explain what you're seeing, but it's worth noting that adding barrier(10) introduces a NoOpBarrierStep which doesn't extend from CollectingBarrierStep so you might not be testing what you think you're testing. As far as NoOpBarrierStep goes it does seem to collect items inside the repeat() according to the size given to it. Is it possible that in extending CollectingBarrierStep you've overridden the barrier behavior in some way that isn't allowing it to work the way you expect?
Lyndon
LyndonOP16mo ago
It works correctly in any place besides the repeat. The barrier i have is a simple consumer, it does not have anything fancy and with that it only gets 1 item at a time. I was looking at the repeat step and it seems like it has no additional barrier friendly logic to allow you to pull more from the left before pushing right, always grabbing one checkign emit/until conditions, executing step. I will try to dig into this on the Repeat step side sometiem soon and see if I can prove this out there
Lyndon
LyndonOP15mo ago
Hey @spmallette so I got some time to actually test this finally.... I made this test: (in a gist since discord character limits me...) https://gist.github.com/lyndonbauto/0cb964eaa1d4ed4ad67c41aeb97ca630 And what I found was that the input for the until step was 1 item at a time, while the repeat step was properly pulled with the barrier. So if you were using a barrier style strategy in your in/out step to allow batch execution like many likely are, the performance of the until step which runs first would be very poor. I am working on a fix for this
Gist
Barrier not working in repeat/until
Barrier not working in repeat/until. GitHub Gist: instantly share code, notes, and snippets.
Kennh
Kennh15mo ago
This might be expected behavior for the UntilStep. Those nested traversals are generally passed just one traverser at a time from the parent traversal. If this is something you wanted to change, we would probably have to do that for all TraversalParent's to keep it consistent.
Lyndon
LyndonOP15mo ago
You might be right, i talked to Marko a bit about it and he seemed to think it was a bug but I will mention that point and see what he thinks. IMO TraversalParents should respect barriers that are in their traversals because it has a very large performance implications. Forcing it to only behave one way seems to have no benefit, but maybe I am missing something. I will look into this a bit more though, cause now you have me thinking and recalling that project() seems to do the same thing and maybe others... Yeah seems like
g.V(?).out().project("a").by(__.barrier().out()).toList()
g.V(?).out().project("a").by(__.barrier().out()).toList()
and
g.V(?).out().where(__.barrier().out()).toList()
g.V(?).out().where(__.barrier().out()).toList()
both have items from the out() come in 1 at a time. You can do:
g.V(?).out().fold().project("a").by(__.unfold().barrier().out()).toList()
g.V(?).out().fold().project("a").by(__.unfold().barrier().out()).toList()
and
g.V(?).out().fold().where(__.unfold().barrier().out()).toList()
g.V(?).out().fold().where(__.unfold().barrier().out()).toList()
However then you have different behavior than intended, because now your project doesn't come out as a bunch of "a" : <vertex list>, instead you have "a" : <list of vertex lists> which isn't exactly desired. IMO it makes sense to do this under the hood and allow barrier() steps in the traversalparent's innards to barrier and batch execute under the hood and then reassign the output as is approriate so you can have the performance benefits of a barrier inside the project's by step but still have the output assigned as "a" : <vertex list> Would like to hear others perspectives if they think this doesnt make sense
Kennh
Kennh15mo ago
At first glance, this would probably be ok, but I get the feeling there will be a small subset of cases that will have their semantics changed. In which case, you might want to make a devlist post.
Lyndon
LyndonOP15mo ago
yeah I think there's more tinking i need to do here before proceeding. Will come back to this when I have thought about it more
Want results from more Discord servers?
Add your server