cdegroc
cdegroc
Explore posts from servers
JJanusGraph
Created by cdegroc on 6/11/2024 in #questions
Potentially useless allocations when checking a field cardinality
No description
2 replies
JJanusGraph
Created by cdegroc on 2/20/2024 in #questions
Concurrent updates during a REINDEX
👋🏻 Hello. I was reading JanusGraph documentation on reindexing (https://docs.janusgraph.org/schema/index-management/index-reindexing/) which says
JanusGraph can begin writing incremental index updates right after an index is defined. However, before the index is complete and usable, JanusGraph must also take a one-time read pass over all existing graph elements associated with the newly indexed schema type(s). Once this reindexing job has completed, the index is fully populated and ready to be used. The index must then be enabled to be used during query processing.
which made me wonder how JanusGraph handles incremental updates happening concurrently to a REINDEX. For instance, if we consider a slow reindexing process (e.g. done through the ManagementSystem interface) that can take several hours, how are concurrent additions/updates/deletions of vertices/edges/properties handled? Won't they just be overwritten by the reindexing process that could be based on old data? Any thought / pointer to code would be greatly appreciated. Thanks!
16 replies
ATApache TinkerPop
Created by cdegroc on 1/17/2024 in #questions
LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking
👋🏻 Hi all! In this JanusGraph post (https://discord.com/channels/981533699378135051/1195313165278388334/1195313165278388334), we were investigating if TreeStep could be used jointly with bulked traversers so as to improve traversal time. Based on answers there, it looks like TinkerPop's LazyBarrierStrategy explicitly excludes "path-tracking" traversals (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.java#L85) and won't insert NoOpBarrierSteps in those cases, preventing us from bulking traversers. At a high level, I imagine this could be needed if traversers history was lost on bulking. Though, I'm not familiar enough with the code to be able to tell if this could concretely be a reason. I have observed some "path-tracking" traversals (with TreeStep) to run properly with bulking on JanusGraph (https://discord.com/channels/981533699378135051/1195313165278388334/1196486987474022400). In addition, commenting out the check on TraverserRequirement.PATH (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.java#L85) does not break gremlin-core unit tests locally. So my question is: would anyone have context on why "path-tracking" traversals can't use bulking? Is there any chance this exclusion could be scoped down to a subset of "path-tracking" traversals maybe? Any pointer appreciated! 🙇🏻‍♂️
12 replies
JJanusGraph
Created by cdegroc on 1/12/2024 in #questions
TreeStep and MultiQuery support
On JanusGraph 1.0, a traversal like g.V().has(...).out(...).has(...).out(...).has(...) nicely leverages the MultiQuery optimisation and returns results in acceptable time. However, as soon as we add a tree() step, as in g.V().has(...).out(...).has(...).out(...).has(...).tree(), all MultiQuery optimisations are disabled and the traversal time increases drastically. Based on the following code, I think this applies to all Steps with PATH requirement (e.g. PathStep, TreeStep): https://github.com/JanusGraph/janusgraph/blob/v1.0/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/JanusGraphTraversalUtil.java#L393 Could a knowledgeable person chime in and explain if disabling MultiQuery is a hard requirement by design (e.g. the traverser's history needs to be kept and MultiQuery does not allow that), if it's just that the optimisation was not implemented for this step or if this can be changed easily (as in just removing that condition), or if there could be other approaches to get a subgraph/tree that wouldn't have such limitation? Thanks!
10 replies
ATApache TinkerPop
Created by cdegroc on 10/24/2023 in #questions
Gremlin Driver and frequently changing servers
In a containerised environment, hosts are frequently replaced and their IP address can change several times a day. As far as I can tell, Gremlin Driver was designed for long-lived hosts given that: (1) Contact Points are resolved on startup and a connection pool is assigned to them at that time - this makes varying contact points over time not possible I think? (2) Unavailable hosts are retried but the list of hosts is not refreshed - wouldn't it make more sense to give up on hosts after a few retries and refresh the list of contact points? I was wondering if the community had any guidance for containerised environments, if handling such cases within Gremlin Driver made sense, and if it'd be worth filing a JIRA ticket. Thanks!
8 replies