Traversal Inspection for properties used

Is there any way to inspect a traversal to figure out what properties are used throughout it? I am looking at the traversal API / steps and can't see anything that looks like it would fulfill the purpose. Something that would tell what properties are used, which are returned. This would be useful if you had a case where you had several hundred properties on a vertex but maybe only check the label and return the id, meaning you don't need to know all the properties to execute the traversal, or maybe you only check one property or just return one.
16 Replies
Yang Xia
Yang Xia7mo ago
Would the EventStrategy help? Something like the MutationListener? https://tinkerpop.apache.org/docs/current/reference/#_eventstrategy
Lyndon
LyndonOP7mo ago
I think that would occur too late, as an example let's say we do:
g.V(<id>).as("a").out().has("name", "Lyndon").select("a").properties("age")
g.V(<id>).as("a").out().has("name", "Lyndon").select("a").properties("age")
I want to know when I read vertex with <id> that I should only pull the property "age" from it, and when I read the out vertices of <id> I want to know that I should only read "name". Now I could start inspecting the entire traversal for steps and labels and selects and tree steps and paths and has containers etc etc etc but this gets kind of ridiculous to manage fast, so I was wondering if there's anything build in there... Doesn't seem like it. Maybe it would be a good addition
Valentyn Kahamlyk
other way is to added wrapper for elements and Strategy to get results
Lyndon
LyndonOP7mo ago
What do you mean by strategy to get results? Right now I have a strategy that inspects the traversal and is like 'okay you are going to read vertices and only use these properties ' and pushes that down to the backend vertex read to not graball the data, but it's based on inspection of the traversal so it's not very generic and is kind of complicated as it stands
Valentyn Kahamlyk
like run traversal for single element, collect all used properties and use it when need to get all other results
Lyndon
LyndonOP7mo ago
ah I see
Valentyn Kahamlyk
or Graph return vertex/edge and keeps track of which properties were used, then reads only used one
spmallette
spmallette7mo ago
like run traversal for single element, collect all used properties and use it when need to get all other results
i'm not sure that would work. doesn't that break down in the face of a variety of normal circumstances? even g.V().hasLabel('person', 'airport') - a "person" vertex won't have the same properties as an "airport" vertex, so a single vertex couldn't be a model for the entire traversal
Lonnie VanZandt
Lonnie VanZandt7mo ago
Lightly reading: does simply parsing the query statement itself help? Rather than examining all the content flowing through the consideration pipeline, why not study the query itself? Perhaps you seek statistics about, for actual content, which of the predicates proved to be the most discriminatory?
spmallette
spmallette7mo ago
i don't think there is any easy way to do this today. you would have to inspect the traversal to find steps that reference property keys. i guess you could do that with a strategy. in your example:
g.V(<id>).as("a").out().has("name", "Lyndon").select("a").properties("age")
g.V(<id>).as("a").out().has("name", "Lyndon").select("a").properties("age")
the strategy could find properties('age') and write "age" back to the initial V() and "name" back to the out() so that when the traversal executes you'd have reference to it. of course, this approach grows in complexity quickly. you'd have to detect that "a" corresponds back to the initial start V(). easy enough for this isolated case, but it only gets trickier from there. if you're trying to just reduce hundreds of properties to a few possible ones, maybe you could just propagate all found property keys back to all steps that preceded them as a hint. that may not be good enough though.
Lyndon
LyndonOP7mo ago
Yeah it does in those kind of circumstances. The ideal way to have it would be that any time you collect vertices, it would know if it can optimize and pull down less than all the properties, like
g.V(<id>).out().out().toList()
g.V(<id>).out().out().toList()
Being able to tell that the first read doesn't need any properties, and the all the vertices read with the first out don't need any properties would be the goal, however the 2nd out we need all the properties. What you mentioned there is what I have right now, a strategy that inspects the traversal for HasContainers and Property steps, but also looks for labels that might mess with things, or Path steps/Tree steps that also sometimes use properties. The growing complexity is the problem, it's already hard to add more to it I am trying to find a clean way to do it but maybe there just isn't one... What Valentyn mentioned about doing a wrapped element and checking what properties are used is an interesting idea. So I could potentially just execute a mock equivalent traversal (if there is no repeats and if/else logic), with fake data, and each time a step returns a vertex I attach the wrapped vertex to that step, then that wrapped vertex accumulates a list of the property keys which are invoked (or if all are called) Still might be kind of hard to do, but seems possible.
spmallette
spmallette7mo ago
i'm not sure i follow that. it's a mock traversal, but what does it execute against?
Lyndon
LyndonOP7mo ago
g.V(<id>) would just pop out a mock vertex, out() would pop out another mock vertex, etc. I'd expect it to run inside the provider database inside a strategy, it would keeping track of what step emits which vertex and then each vertex would keep track of which property keys on it were invoked, or if all were. Then the strategy would give that info to the rest of the strategies/step replacements so they could appropriately pushdown property grabs on the vertices. Definitely a workaround
Yang Xia
Yang Xia7mo ago
Do you run the mock traversal every time? How do you keep the accumulated mock updated if/when properties change in the graph?
Lyndon
LyndonOP7mo ago
I would add a way to disable it, but basically would just run it when the query comes in before I execute the actual query, so it would be accurate for the duration of that query
Yang Xia
Yang Xia7mo ago
Gotcha
Want results from more Discord servers?
Add your server