Does bulking optimization provided by LazyBarrierStrategy improves query performance?
I’m having a hard time understanding usefulness of the
LazyBarrierStrategy
which supposedly adds bulking optimization.
In the nutshell LazyBarrierStrategy
simply adds barrier(2500)
after FlatMapStep
.
As I understand it means to execute previous FlatMapStep
up to 2500 times before moving to the next step.
I hardly understand the usefulness of such barrier step being inserted after FlatMapStep
. Do you know any use-case when LazyBarrierStrategy
improves query performance anyhow or brings any benefit?7 Replies
Here is a simple test which shows that without
LazyBarrierStrategy
we touch less vertices during traversals. Moreover I usually see better query performance without LazyBarrierStrategy
but slightly worse with LazyBarrierStrategy
. It would be great if anyone spill a little bit of light on this topic.Don't know much about the Lazy barrier, however, Sqlg removes all TinkerPop barrier steps. TinkerPop has no notion of managing memory and nor should it. Loading a full query result into memory risks crashing the jvm. Better to let the underlying db manage the results.
Thanks for the insight. This strategy is enabled in JanusGraph by default and it seems doesn't provide any benefit (unless I'm missing any use-case when
LazyBarrierStrategy
provides any benefit). Thus, we are thinking of disabling this strategy by default.In relational databases, these types of aggregation operators are quite common as well. The principle is most commonly referred to as "vectorization" and helps efficiently utilizing CPU caches. Vectorization is most helpful for CPU or Memory-intensive workloads. In JanusGraph however, most of the query evaluation is probably spent waiting for network traffic from the storage backend. I suppose that's why these barrier steps are not helpful in most queries.
Not sure if you're already aware of it, but I think the original reasoning behind the
LazyBarrierStrategy
was described by Marko in this blog post under Section 3: Traversal Optimization via Bulking: https://www.datastax.com/blog/tales-tinkerpopDataStax
Tales from the TinkerPop | Datastax
Read the latest announcements, product updates, community activities and more. Subscribe now to the DataStax blog!
Thanks for sharing this. I haven't seen this blog before. This one was interesting. So, basically the main advantage of using
barrier
step after FlatMapStep
is to reduce the amount of operations needed for the same traversals. I.e. if out()
returns 10 duplicate vertices then we are going to execute a single operation for them later instead of executing 10 times the same operation. This is an interesting use-case. Thanks for sharing! I'm not sure how often this use-case is practical, but I see it may be beneficial in some cases. I guess users should explicitly disable LazyBarrierStrategy
in such case or use barrier(1)
instead.a pity that the formatting of that post has fallen into such disrepair