Gremlin server plugin for running additional function on each vertex edge
Dear TinkerPop team,
I'm currently trying to wrap my head around a specific problem I've been trying to solve for a few days.
Basic overview:
Vertices and edges can have a code field, e.g.:
Vertex: a = True
Edge: a == True
While "travelling" over vertices, a custom script engine runs and saves the resulting variables with name and value into the current gremlin sack.
When an edge is reached, the script engine evaluates the expression and tells gremlin to continue or break the current path.
The script engine is written in Rust and works, the java bindings are not a problem either.
What I can't figure out and would really appreciate your help with, is the gremlin part of this project. As far as I know, I have to create a gremlin server plugin like GremlinServerGremlinPlugin.java in the tinkerpop github repository, but if and how I can inject this custom functionality, is beyond me.
Any hint would be highly appreciated.
Thanks
Volker
Solution:Jump to solution
you understand them properly, but you perhaps didn't connect their use to your case. you would define a
TraversalStrategy
that replaces steps that traverse vertex/edge data like out()
or inE()
with your own implementation for those steps. in that way you will have access to the vertex/edge Traverser
as it passes through the step. i suppose you could also consider adding a special step that wraps those steps or follows them depending on your needs. i'm not sure which is best offhand.34 Replies
interesting. i think i need more context though to grasp what you are doing before i can offer any advice
when you say "script engine" are you referring to an actual JSR-223 compliant
ScriptEngine
? or are you alluding to something else?it's a script engine which atm handles a small subset of the python syntax
the actual db is implemented with janus graph
but, you've explicitly implemented java
ScriptEngine
JSR-223 interfaces to build that?
(like how we have GremlinGroovyScriptEngine
?)ah, no. sry if i communicated that incorrectly. the idea behind the project is not to extend the current query language ("developer" side), but to provide the "user" side the option to control the graph execution in python/c#/... like syntax
the a = True, a == True, a == False is just a text property stored on the vertex/edge
which can be changed by the user
and during (full) graph traversal, the script engine executes this code, updates the gremlin sack with the variables or tells the traversal instance to continue along the edge or break
ok, so you just refer to your custom processing there as a "script engine". that script engine evaluates a script stored as a property on the vertex or edge to control where Gremlin navigates? is that the rough idea?
yes
ok, so now i understand the second part of what you originally wrote a bit better.
The script engine is written in Rust and works, the java bindings are not a problem either.could you explain where you hook in those java bindings to TinkerPop?
With that part, i meant that in my demo java<->rust project the bindings are working. i can't figure out where and how to hook into tinkerpop
my idea looks like this:
so i should probably hook into the processing of each new vertex/edge?
I think you want to do this with a
TraversalStrategy
https://tinkerpop.apache.org/docs/current/reference/#traversalstrategy
are you familiar with those at all?I've stumbled across them while reading the documentation, but thought that they are just to mutate/verify the defined steps before execution thus don't have access to the specific vertex/edge data during execution
am i mistaken?
Solution
you understand them properly, but you perhaps didn't connect their use to your case. you would define a
TraversalStrategy
that replaces steps that traverse vertex/edge data like out()
or inE()
with your own implementation for those steps. in that way you will have access to the vertex/edge Traverser
as it passes through the step. i suppose you could also consider adding a special step that wraps those steps or follows them depending on your needs. i'm not sure which is best offhand.I hope
TraversalStrategy
works for you. You might look at examples from JanusGraph or Neo4jGraph to get further inspiration. If you have further questions, consider posting in the #implementers channel since you're building extensions to TinkerPop (but here is fine too if you prefer this format)Yes, thanks for the input. I'm currently trying to implement a solution with TraversalStrategy. I've also looked into implementing special steps, but thats seems to be rather complicated because I would have to replace quite a few classes on the client and server side. Althoug I'm still fighting with the actual Java implementation, especially the compilation (since my Java experience is limited), maybe you can tell me if what I have planned is feasible:
AbstractGremlinPlugin that registers my own ScriptDecorationStrategy on the server, which replaces the "out" steps with a VertexStep wrapper, which just packs the VertexStep.flatMap() result into an Iterator wrapper, which then executes the script on iteration and just jumps over "invalid" edges
public final class ScriptDecorationStrategy extends AbstractTraversalStrategy<TraversalStrategy.DecorationStrategy>
implements TraversalStrategy.DecorationStrategy {
@Override
public void apply(Admin<?, ?> traversal) {
traversal.getSteps().forEach(step -> {
if (step instanceof VertexStep && ((VertexStep<?>) step).getDirection() == Direction.OUT
&& ((VertexStep<?>) step).returnsVertex()) { // returnsVertex is only needed because of the current db implementation, which I can't change :C
// out -> change to scriptout
step = new ScriptVertexStep<>(traversal, Vertex.class, Direction.IN,
((VertexStep<?>) step).getEdgeLabels());
}
});
}
}
public class ScriptVertexStep<E extends Element> extends VertexStep<E> {
public ScriptVertexStep(final Traversal.Admin traversal, final Class<E> returnClass, final Direction direction,
final String... edgeLabels) {
super(traversal, returnClass, direction, edgeLabels);
}
@Override
protected Iterator<E> flatMap(final Traverser.Admin<Vertex> traverser) {
return new ScriptFilterIterator<E>(super.flatMap(traverser));
}
}
public class ScriptFilterIterator<E extends Element> implements Iterator<E> {
private Iterator<E> orig;
private E next;
public ScriptFilterIterator(Iterator<E> orig) {
this.orig = orig;
}
@Override
public boolean hasNext() {
if (next != null)
return true;
return tryComputeNext();
}
@Override
public E next() {
if (!hasNext()) {
throw new NoSuchElementException();
}
final E ret = next;
tryComputeNext();
return ret;
}
private boolean tryComputeNext() {
try {
next = orig.next();
// Script execution
/*
* If Edge && false {
* return tryComputeNext();
* }
*/
return true;
} catch (NoSuchElementException ex) {
next = null;
return false;
}
}
}
If this would work, I don't have to add a special step and the developer still has the option to use out() normally, as long as there isn't a specific "script" parameter defined on the vertex, even if the strategy is enabled
you seem to be on the right track. i think a decoration strategy makes sense. i wonder if you should actually extend
VertexStep
though. i'm not sure of the implications offhand.
in any case i'd say just follow the path you're taking for now and get things all working before making that choice
as an aside, i'm curious. is this a personal project you are working on? will it be something made publicly available?Ok, thanks for the confirmation. I'll continue to keep you updated on the progress. Sadly, this is not a private project. I'm developing this as a working student, but I try to convince my boss that it would be nice to release it as open source to improve the companies outreach and maybe get some free maintainers
And thank you again for your help. I know how time consuming managing a community, answering everybody and developing the actual product can be
well, i will confirm again that this is a feature that folks ask about fairly often. in the past, i've experimented with it using Groovy scripts stored on vertices/edges and it worked reasonably well. the problem with Groovy are the security issues of running arbitrary scripts.
Yes, this was the number one risk for my boss aswell. My solution is a custom parser, compiler and (sandboxed) executor which works on a limited, developer configured subset of the preferred scripting language, thus giving the developer 100% control over what is possible to do
What about this idea? Rather than have your graph engine be your function execution environment, store a call to a serverless function in a lambda step which the query only "executes" via making an in-flight network request? Then you have the liberty to tailor your function execution environment separately from whatever is hosting and running Gremlin TinkerPop. Whatever language or class libraries you need you provide in the serverless function environment. The cost, of course, is the network latency for that call to the lambda that calls the serverless function. If optimal query performance is your goal, this would be horrible. But if graph traversal triggered execution of user-supplied code is your goal, it might work conveniently.
note that there is a
call()
step for this sort of functionality: https://tinkerpop.apache.org/docs/current/reference/#call-stepThanks for your input, sadly the solution has to work for a huge number of script executions in a short time
This looks promising, I'll look into it once the first version is running
I'm currently stuck installing the custom Gremlin server plugin into a Docker installation of Janusgraph.
Adding it under scriptEngines:gremlin-groovy:plugins in janusgraph-server.yaml gets the plugin loaded, but the custom strategy is not executed for queries
Init logs of the plugin and strategy:
INFO org.apache.tinkerpop.gremlin.server.jsr223.ScriptingGremlinServerPlugin - Plugin loaded
INFO org.apache.tinkerpop.gremlin.server.jsr223.ScriptDecorationStrategy - Strategy loaded
but this is the output for a simple query:
Traversal Explanation
===============================================================================================
Original Traversal [GraphStep(vertex,[4224]), VertexStep(OUT,vertex), VertexStep(OUT,vertex)]
RemoteStrategy [D] [RemoteStep(DriverServerConnection-localhost/127.0.0.1:8182 [graph=g])]
what query are you sending to the server to get that explain?
g.V(4224).out().out().explain()
where are you automatically installing that strategy? is that in the config for Janus Server or something?
in the GremlinPlugin.getCustomizers function I build the ImportCustomizer and before returning it, register the strategy:
TraversalStrategies.GlobalCache.registerStrategies(Graph.class, TraversalStrategies.GlobalCache.getStrategies(Graph.class).addStrategies(ScriptDecorationStrategy.instance()));
is this the wrong way to do this?
does it work if you explicitly use it in your query? like
g.withStrategies(ScriptDecorationStrategy.instance()).V(....
?that seems to work
well, at least its finding the strategy
not sure offhand why it doesn't work if you do it the way you did it
personally i would have configured it in the server startup script in the construction of "g".
but its a bit curious it won't work through the GlobalCache
actually ...come to think of it, that
explain()
you have is a remote. i think there is a weird thing with remote explains over bytecode. send a script to the server and see if you get a better explain outputok, enabled remote console
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode
without strategy explicitly enabled:
gremlin> g.V(4224).out().explain()
==>Traversal Explanation
===================================================================================================================
Original Traversal [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
ConnectiveStrategy [D] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
IdentityRemovalStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
MatchPredicateStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
FilterRankingStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
InlineFilterStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
IncidentToAdjacentStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
RepeatUnrollStrategy [O] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
...
the strategy is not applied, but if I enable it again explicitly:
gremlin> g.withStrategies(ScriptDecorationStrategy.instance()).V(4224).out().explain()
==>Traversal Explanation
===================================================================================================================
Original Traversal [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
ScriptDecorationStrategy [D] [GraphStep(vertex,[4224]), VertexStep(OUT,vertex)]
it's applied
so it seems like the strategy is not automatically applied for the travelsource on the server, right?
strange
are you truncating the output at all? why does the addition of
ScriptDecorationStrategy
remove all the other strategies?ah sorry, it does not. i just truncated the rest for clarity
it executes the same strategies as without the ScriptDecorationStrategy
so after some debugging, I found the problem: JanusGraph registers it's own StandardJanusGraph Graph, which clones the standard Graph strategies, but is loaded before my plugin. To be safe, I just get the private GRAPH_CACHE from GlobalCache, iterate all entries and add my strategy to each of them. This seems to work