Apache TinkerPop•3y ago

Extracting the ProjectStep of a GraphTraversal instance during unit testing

Tl;dr Given an instance of GraphTraversal<?, Map<String, Object>>, is it possible to extract the ProjectStep<?> which is producing the Map<String, Object> inferred by the type system, and which would be returned once a terminal step is applied? We only want to access the .projectKeys() of the ProjectStep, so we don't need to actually execute the traversal. It can be assumed that we are always dealing with an instance of GraphTraversal<?, Map<String, Object>>, and we do not have access to the actual graph in this environment. Context We have a system written in Java that defines Gremlin traversals to be run against a TinkerPop graph (Neptune, but not relevant for this question). These queries should be written such that they should return Map<String, Object> (i.e. using project()) with a specific set of keys which are defined alongside the query. We aren't yet using a custom DSL, and building queries programmatically to ensure the project step is present causes other problems. Why are we doing this? We'd like to give immediate (build-time) feedback to developers that the query they've written is missing an important key, which would otherwise take a deployment and some waiting time to discover. This key must be present, as the query will be executed by an automated system later which will try to extract a value from the Map containing that same key. What have we tried so far? We've actually managed to do this using the mock example here to capture project steps, and recursively capture any which might include a ProjectStep: https://github.com/apache/tinkerpop/blob/0c382bb7ec345f2758bee207d62d66f95c475a78/gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/process/traversal/dsl/graph/GraphTraversalSourceTest.java#L107-L153 We've accomplished this by matching against the invoked method of the mocked GraphTraversal to detect either calls to project method directly, or to TraversalParent steps which may contain child steps (like local) and from there recursively check all steps in search of a project. At that point, we can gather the projectKeys of the ProjectStep. Happy to share some snippets of this if you're curious. However, our implementation identifies any usage of project anywhere in the query, not necessarily the final one that will be returned by the server once a terminating step is applied.. It's technically 'good enough' as it will always catch cases where the required key is never mentioned in the traversal, but it's possible the query uses multiple projects, and the required key is misplaced and won't be part of the final map returned when executed. Thoughts In theory, I'm thinking this must be possible, since as soon as you use project in a traversal, Java is smart enough to understand that you are now working with a GraphTraversal<?, Map<String, Object>>. This might sometimes happen inside a TraversalParent like local(), but the type system can still infer it will receive a Map when terminated. Is there a method (or collection of methods) which would let us grab the 'last effective step' which returns that Map? The solution can be hacky, this is test-case code so there's room for some jank here 🙂 Thanks, folks!

6 Replies

spmallette•3y ago

Nice question. If I understand correctly, at execution time you want to reject traversals that use a project() that do not have specific keys defined. As a warning, there is an "unfortunately" after what starts out as very positive answer. Typically, you would do this with a TraversalStrategy. Specifically, you would write a VerificationStrategy which would read through the traversal find the project() step, examine its keys and throw an VerificationException if you didn't find what you wanted. An easy example to follow for this is ReadOnlyStrategy which finds steps like addV() which mutate the graph and throws the exception if one is found. https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/verification/ReadOnlyStrategy.java Unfortunately, with Neptune (and other remote graphs) you don't have the ability to make your strategy available to the server where it is executed. Strategies don't execute on the client - this is even more true in non-JVM languages where strategies exist only as proxies. So, I think that's the general official answer. As for workarounds....

GitHub

tinkerpop/ReadOnlyStrategy.java at master · apache/tinkerpop

Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.

spmallette•3y ago

I'm not sure what's best for workarounds. I suppose you could obviously pass the Traversal object through a validation function (semi-psuedocode):

public void validate(Traversal.Admin<?,?> t) {
    Step s = t.getEndStep();
    if (s instanceof ProjectStep && !((ProjectStep) s).getProjectKeys().contains('x'))
        throw new VerificationException("Must have key of 'x'");
}

public void validate(Traversal.Admin<?,?> t) {
    Step s = t.getEndStep();
    if (s instanceof ProjectStep && !((ProjectStep) s).getProjectKeys().contains('x'))
        throw new VerificationException("Must have key of 'x'");
}

that won't execute automatically on iteration so i suppose that's your downside there I thought I had some mechanism to allow this to work automatically on the call of the termination step, but as i look deeper i'm not sure that's possible. I think it would be neat if RemoteStrategy could somehow have a client-side counterpart or take some verification function that let it check a traversal before submission. would that be helpful?

jesseaOP•3y ago

at execution time

Ideally even earlier, before we even have a graph to work against (since this requires developers to deploy the latest build to their developer accounts). Right now, I have this running as a unit test that pulls in all queries and their required keys, and analyses the GraphTraversal instances entirely offline. VerificationStrategy is probably a good fit for this otherwise (graph-permitting) In your mocking example, you override a pre-defined termination step to do some tidy up (returning a mocked response in your case). I've been using this same block to return project keys gathered during the traversal

jesseaOP•3y ago

https://gist.github.com/JeeZeh/1a3030ce71595c5ec6c0cac9190abb9c Here's what I have so far, but I think this could be greatly simplified with the snippet you provided with getEndStep

Gist

Extracting ProjectStep from TinkerPop GraphTraversal

Extracting ProjectStep from TinkerPop GraphTraversal - ProjectKeysExtractor.java

spmallette•3y ago

Well, if this is more of a static analysis tool that can take traversals to be executed in the future somewhere, then yeah, i'd say you just use the methods on Traversal.Admin to pick apart the steps in the traversal, find project() and extract the keys. that seems quite straightforward.

jesseaOP•3y ago

Gotcha, yeah this is more or less how we're doing it. Will try to simplify this and post an update when I get something concise working

/**
 * A wrapper around {@link #extractProjectKeys(Traversal)} that guarantees the passed-in traversal
 * at least ends with a step that produces a {@code Map<String, Object>} when terminated.
 */
private List<String> extractProjectKeysFromMapTraversal(
    GraphTraversal<?, Map<String, Object>> traversalEndingWithMap) {
  return this.extractProjectKeys(traversalEndingWithMap);
}


/**
 * Extracts the keys of the final {@link ProjectStep} in a given {@link Traversal}.
 * 
 * <p>
 * If the traversal ends with a {@link TraversalParent}, the last child of that step is then
 * recursively processed again. This continues until either a {@link ProjectStep} is found, or an
 * exception is raised as the traversal does not end on a {@link ProjectStep}.
 * 
 * <p>
 * NB: This method assumes the traversal given ends with a {@code Map<String, Object>}, so this
 * should be validated prior to calling this method.
 * 
 * @param traversal the {@link Traversal} from which to extract the keys of the final
 *        {@link ProjectStep}
 * @return the keys of the final {@link ProjectStep} in the provided {@link Traversal}
 * @throws IllegalStateException if the final step is not a {@link ProjectStep} or a
 *         {@link TraversalParent} which can be further processed (must have only one child)
 */
private List<String> extractProjectKeys(Traversal<?, ?> traversal) throws IllegalStateException {
  Step<?, ?> end = traversal.asAdmin().clone().getEndStep();
  Preconditions.checkArgument(
      end instanceof ProjectStep || end instanceof TraversalParent,
      "Traversal must end with project step or a parent step containing a project step.");

  // If the last step is a ProjectStep, we're done.
  if (end instanceof ProjectStep) {
    return ((ProjectStep<?, ?>) end).getProjectKeys();
  }

  // Otherwise recursively find the project step if the final step is a parent step (e.g. local())
  List<Traversal.Admin<Object, Object>> childSteps = ((TraversalParent) end).getLocalChildren();
  // TODO: Should we validate all children of a parent step? If a parent takes multiple children
  // it might be a condition, so we could validate that all branches end with project()
  Preconditions.checkArgument(
      childSteps.size() == 1,
      "Encountered a final step which is a TraversalParent with more than one child. This is not yet supported.");
  return extractProjectKeys(childSteps.get(0));
}

/**
 * A wrapper around {@link #extractProjectKeys(Traversal)} that guarantees the passed-in traversal
 * at least ends with a step that produces a {@code Map<String, Object>} when terminated.
 */
private List<String> extractProjectKeysFromMapTraversal(
    GraphTraversal<?, Map<String, Object>> traversalEndingWithMap) {
  return this.extractProjectKeys(traversalEndingWithMap);
}


/**
 * Extracts the keys of the final {@link ProjectStep} in a given {@link Traversal}.
 * 
 * <p>
 * If the traversal ends with a {@link TraversalParent}, the last child of that step is then
 * recursively processed again. This continues until either a {@link ProjectStep} is found, or an
 * exception is raised as the traversal does not end on a {@link ProjectStep}.
 * 
 * <p>
 * NB: This method assumes the traversal given ends with a {@code Map<String, Object>}, so this
 * should be validated prior to calling this method.
 * 
 * @param traversal the {@link Traversal} from which to extract the keys of the final
 *        {@link ProjectStep}
 * @return the keys of the final {@link ProjectStep} in the provided {@link Traversal}
 * @throws IllegalStateException if the final step is not a {@link ProjectStep} or a
 *         {@link TraversalParent} which can be further processed (must have only one child)
 */
private List<String> extractProjectKeys(Traversal<?, ?> traversal) throws IllegalStateException {
  Step<?, ?> end = traversal.asAdmin().clone().getEndStep();
  Preconditions.checkArgument(
      end instanceof ProjectStep || end instanceof TraversalParent,
      "Traversal must end with project step or a parent step containing a project step.");

  // If the last step is a ProjectStep, we're done.
  if (end instanceof ProjectStep) {
    return ((ProjectStep<?, ?>) end).getProjectKeys();
  }

  // Otherwise recursively find the project step if the final step is a parent step (e.g. local())
  List<Traversal.Admin<Object, Object>> childSteps = ((TraversalParent) end).getLocalChildren();
  // TODO: Should we validate all children of a parent step? If a parent takes multiple children
  // it might be a condition, so we could validate that all branches end with project()
  Preconditions.checkArgument(
      childSteps.size() == 1,
      "Encountered a final step which is a TraversalParent with more than one child. This is not yet supported.");
  return extractProjectKeys(childSteps.get(0));
}

This is indeed much simpler! Since we have access to plain GraphTraversal instances, we don't even need to mock the GraphTraversalSource. Thanks for the tips! For now, we're not going to handle cases where the final step is a TraversalParent with more than one child step, since I'm not sure we can easily cover all cases. All traversals are proxied through extractProjectKeysFromMapTraversal which will guarantee the traversal actually returns a Map, so hopefully this should reduce the chance of running into one of those cases.

Gaming

Programming

Extracting the ProjectStep of a GraphTraversal instance during unit testing

Did you find this page helpful?