Extracting the ProjectStep of a GraphTraversal instance during unit testing
Tl;dr
Given an instance of
GraphTraversal<?, Map<String, Object>>
, is it possible to extract the ProjectStep<?>
which is producing the Map<String, Object>
inferred by the type system, and which would be returned once a terminal step is applied?
We only want to access the .projectKeys()
of the ProjectStep
, so we don't need to actually execute the traversal. It can be assumed that we are always dealing with an instance of GraphTraversal<?, Map<String, Object>>
, and we do not have access to the actual graph in this environment.
Context
We have a system written in Java that defines Gremlin traversals to be run against a TinkerPop graph (Neptune, but not relevant for this question). These queries should be written such that they should return Map<String, Object>
(i.e. using project()
) with a specific set of keys which are defined alongside the query. We aren't yet using a custom DSL, and building queries programmatically to ensure the project step is present causes other problems.
Why are we doing this?
We'd like to give immediate (build-time) feedback to developers that the query they've written is missing an important key, which would otherwise take a deployment and some waiting time to discover. This key must be present, as the query will be executed by an automated system later which will try to extract a value from the Map
containing that same key.
What have we tried so far?
We've actually managed to do this using the mock example here to capture project
steps, and recursively capture any which might include a ProjectStep
: https://github.com/apache/tinkerpop/blob/0c382bb7ec345f2758bee207d62d66f95c475a78/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalSourceTest.java#L107-L153
We've accomplished this by matching against the invoked method of the mocked GraphTraversal
to detect either calls to project
method directly, or to TraversalParent
steps which may contain child steps (like local
) and from there recursively check all steps in search of a project
. At that point, we can gather the projectKeys
of the ProjectStep
. Happy to share some snippets of this if you're curious.
However, our implementation identifies any usage of project
anywhere in the query, not necessarily the final one that will be returned by the server once a terminating step is applied.. It's technically 'good enough' as it will always catch cases where the required key is never mentioned in the traversal, but it's possible the query uses multiple projects, and the required key is misplaced and won't be part of the final map returned when executed.
Thoughts
In theory, I'm thinking this must be possible, since as soon as you use project
in a traversal, Java is smart enough to understand that you are now working with a GraphTraversal<?, Map<String, Object>>
. This might sometimes happen inside a TraversalParent
like local()
, but the type system can still infer it will receive a Map
when terminated. Is there a method (or collection of methods) which would let us grab the 'last effective step' which returns that Map
? The solution can be hacky, this is test-case code so there's room for some jank here ๐
Thanks, folks!6 Replies
Nice question. If I understand correctly, at execution time you want to reject traversals that use a
project()
that do not have specific keys defined. As a warning, there is an "unfortunately" after what starts out as very positive answer. Typically, you would do this with a TraversalStrategy
. Specifically, you would write a VerificationStrategy
which would read through the traversal find the project()
step, examine its keys and throw an VerificationException
if you didn't find what you wanted. An easy example to follow for this is ReadOnlyStrategy
which finds steps like addV()
which mutate the graph and throws the exception if one is found. https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/verification/ReadOnlyStrategy.java Unfortunately, with Neptune (and other remote graphs) you don't have the ability to make your strategy available to the server where it is executed. Strategies don't execute on the client - this is even more true in non-JVM languages where strategies exist only as proxies. So, I think that's the general official answer. As for workarounds....GitHub
tinkerpop/ReadOnlyStrategy.java at master ยท apache/tinkerpop
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
I'm not sure what's best for workarounds. I suppose you could obviously pass the
Traversal
object through a validation function (semi-psuedocode):
that won't execute automatically on iteration so i suppose that's your downside there
I thought I had some mechanism to allow this to work automatically on the call of the termination step, but as i look deeper i'm not sure that's possible.
I think it would be neat if RemoteStrategy
could somehow have a client-side counterpart or take some verification function that let it check a traversal before submission. would that be helpful?at execution timeIdeally even earlier, before we even have a graph to work against (since this requires developers to deploy the latest build to their developer accounts). Right now, I have this running as a unit test that pulls in all queries and their required keys, and analyses the GraphTraversal instances entirely offline.
VerificationStrategy
is probably a good fit for this otherwise (graph-permitting)
In your mocking example, you override a pre-defined termination step to do some tidy up (returning a mocked response in your case). I've been using this same block to return project keys gathered during the traversalhttps://gist.github.com/JeeZeh/1a3030ce71595c5ec6c0cac9190abb9c
Here's what I have so far, but I think this could be greatly simplified with the snippet you provided with
getEndStep
Gist
Extracting ProjectStep from TinkerPop GraphTraversal
Extracting ProjectStep from TinkerPop GraphTraversal - ProjectKeysExtractor.java
Well, if this is more of a static analysis tool that can take traversals to be executed in the future somewhere, then yeah, i'd say you just use the methods on
Traversal.Admin
to pick apart the steps in the traversal, find project()
and extract the keys. that seems quite straightforward.Gotcha, yeah this is more or less how we're doing it. Will try to simplify this and post an update when I get something concise working
This is indeed much simpler! Since we have access to plain
GraphTraversal
instances, we don't even need to mock the GraphTraversalSource
. Thanks for the tips!
For now, we're not going to handle cases where the final step is a TraversalParent
with more than one child step, since I'm not sure we can easily cover all cases. All traversals are proxied through extractProjectKeysFromMapTraversal
which will guarantee the traversal actually returns a Map, so hopefully this should reduce the chance of running into one of those cases.