@GremlinDSL support in the GremlinLangScriptEngine
Hi,
I recently sent a pull-request into the github ArcadeDB repository to add support binding custom TraversalSources to the embedded graph bound in the script engine. ArcadeDb has modes to support both the
GremlinLangScriptEngine
and the GremlinGroovyScriptEngine
.
https://github.com/ArcadeData/arcadedb/pull/1239
I realized today that the GremlinLangScriptEngine
went untested, and after trying to add a test I've come to question whether there is any support at all for DSLs in the GremlinLangScriptEngine:
AFAICT from the grammar, the traversalmethod are a list of static tokens, https://github.com/apache/tinkerpop/blob/c66cd566941ef7bd34d430828883f9cf79d7442f/gremlin-language/src/main/antlr4/Gremlin.g4#L178-L286 and the traversal root binding name is hardcoded to g
https://github.com/apache/tinkerpop/blob/c66cd566941ef7bd34d430828883f9cf79d7442f/gremlin-language/src/main/antlr4/Gremlin.g4#L1876, and enforced in GremlinLangScriptEngine#eval
This is a bit suprising to find as the DSL documentation doesn't mention these limitations, https://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript-dsl and in the Groovy engine binding the TraversalSource to any token name is allowed.
After reading through the code, I can understand given the current implementation why these limitations might exist, but I'm curiuos if:
1. have I missed something?
2. if not, is work planned to support DSLs in the gremlin lang script engine?
3. if not, can I help?GitHub
adds global configuration key to register gremlin traversal sources...
Allows customizing the gremlin script engines traversal sources. This allows clients of the database to register tinkerpop traversal DSLs or customize the traversal strategies.
Motivation
Using tin...
GitHub
tinkerpop/gremlin-language/src/main/antlr4/Gremlin.g4 at c66cd56694...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
Solution:Jump to solution
The two
ScriptEngine
implementations are not meant to have complete feature parity. GremlinLangScriptEngine
does not process arbitrary code. It only processes Gremlin, which I tend to think is a good thing compared to GremlinGroovyScriptEngine
which will run any arbitrary code and is therefore a bit of a security risk. That said there is some untangling to do in Gremlin Server, ScriptEngines and the grammar and that's the main reason TinkerPop has not yet promoted GremlinLangScriptEngine
over its groovy counterpart despite it being more secure and generally more performant than both groovy and bytecode. This is the reason why we don't have much documentation on it.
I believe that you should be able to process Gremlin that originated from a DSL in the GremlinLangScriptEngine
but you couldn't do it in the fashion you can with groovy. To understand how it's worth noting that any DSL step is really just a compositions of standard Gremlin steps. in other words, a DSL step like:
g.persons()
might really just compose as:
g.V().hasLabel('person')
...GitHub
tinkerpop/gremlin-javascript/src/main/javascript/gremlin-javascript...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
10 Replies
Solution
The two
ScriptEngine
implementations are not meant to have complete feature parity. GremlinLangScriptEngine
does not process arbitrary code. It only processes Gremlin, which I tend to think is a good thing compared to GremlinGroovyScriptEngine
which will run any arbitrary code and is therefore a bit of a security risk. That said there is some untangling to do in Gremlin Server, ScriptEngines and the grammar and that's the main reason TinkerPop has not yet promoted GremlinLangScriptEngine
over its groovy counterpart despite it being more secure and generally more performant than both groovy and bytecode. This is the reason why we don't have much documentation on it.
I believe that you should be able to process Gremlin that originated from a DSL in the GremlinLangScriptEngine
but you couldn't do it in the fashion you can with groovy. To understand how it's worth noting that any DSL step is really just a compositions of standard Gremlin steps. in other words, a DSL step like:
g.persons()
might really just compose as:
g.V().hasLabel('person')
The former won't process in the grammar of GremlinLangScriptEngine
but the latter will. So, your application can write the former and get the benefits of the DSL but you have to be sure that what is submit to ArcadeDB is the Gremlin produced by that DSL. You can do that in one of several ways, but since you're inquiring about GremlinLangScriptEngine
I'll assume this is about sending scripts in which case you'd use your DSL to produce bytecode, then pass that traversal to the translator (https://github.com/apache/tinkerpop/blob/master/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/translator.js) which would then produce a pure Gremlin script to submit with the client, send over HTTP, etc..GitHub
tinkerpop/gremlin-javascript/src/main/javascript/gremlin-javascript...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
if not, is work planned to support DSLs in the gremlin lang script engine?is there a way to natively support DSLs in the grammar? i don't know the answer to that. i do have interests in knowing how to extend antlr grammars as it's possible that providers may wish to do that and it would be good to know what their options are. perhaps those two things are related. at the same time, i'm not sure it would be good for DSLs to be implemented that way as it requires knowledge of antlr, parsing and other complex things. as we look for ways to wholly move to grammar based processing for Gremlin we'll need to know more about how this can all work.
if not, can I help?help is always welcome. if you have thoughts on how we can improve this area and strengthen the grammar and our approach to using it especially in this DSL use case, it would be appreciated.
Thanks for the quick reply Stephen. Just some follow up comments/questions below
It only processes Gremlin, which I tend to think is a good thing compared to GremlinGroovyScriptEnginePrecisely why I'd like to support use of the GLSE 🙂
I'll assume this is about sending scripts in which case you'd use your DSL to produce bytecode, then pass that traversal to the translator (https://github.com/apache/tinkerpop/blob/master/gremlin-javascript/src/main/javascript/gremlin-javascript/lib/process/translator.js) which would then produce a pure Gremlin script to submit with the client, send over HTTP, etc..This integration is at an engine level inside of Arcade DB, in order to support DSLs without requiring the use of the HTTP server, and without requiring a client compilation step, so this is a pure Java integration. Is there a way to produce Traversal bytecode from strings via a pure Java interface? I looked into this a few years ago and believe I found otherwise.
i'm not sure it would be good for DSLs to be implemented that way as it requires knowledge of antlr,Completely agree with you. My thought here is currently someting like....
GitHub
tinkerpop/gremlin-javascript/src/main/javascript/gremlin-javascript...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
What if the @GremlinDsl annotation processor also produced a g4 grammar for the the DSL, then also producing new TraversalMethodVisitor (& TraversalSourceSpawnMethodVisitor... etc) implementations which implemented the DSL => traversal translation as it is done now in those classes, but for the new DSL. Then extending the g4
traversalMethod
rule (traversalSourceSpawn
rule, et.al) with something like:
this would let the method match rule bottom out at something that could match a DslRule. The dsl annotation processor would the be responsible for creating also creating DslTraversalMethodVisitor.java
DslTraversalSourceMethodVisitor.java
, etc... which could then be provided to the GremlinAntlrToJava.java
.
Then the tricky bit is how to provide that... IMO the cleanest way would be via the TraversalSource itself (and in doing so would parameterize the GremlinAntlrToJava
and all visitor classes to <T extends TraversalSource>, so that the TraversalSource is fully responsible for describing itself and it's parsing. However I can certainly appreciate that I am completely naive to the wider architectural design concerns there
... certainly there's lots of details to work out, even if this were in the ballpark.Is there a way to produce Traversal bytecode from strings via a pure Java interface?i think you ask this question in the context of my
g.persons()
example. like, you have "g.persons()" as a string and want to execute that in ArcadeDB. I sense you're best approach is to use GremlinGroovyScriptEngine
in that case. There is no other approach that I can think of.
What if the @GremlinDsl annotation processor also produced a g4 grammar for the the DSL....it's an interesting idea that could be explored further. i do worry that it further complicates the Java DSL process considerably. the gain we get for that additional complexity is to be able to natively process DSLs as scripts. that use case seems relatively narrow - this is the first time I think I've heard someone ask about even trying to do this. it's important to keep in mind though that any solutions considered can't be isolated to java. all languages would need to ensure that their DSL story still makes sense in the face of any changes and/or new capabilities.
>. it's important to keep in mind though that any solutions considered can't be isolated to java.
ahh, ok, yes I forgot that certainly. thanks for the context
@spmallette one last follow up on this. It sounds as though from your answer there is no expectation of feature parity between the script engine implementations. Beyond lambda "support" (/ arbitrary code execution) in the groovy implementation, and this issue here, is there a list of these feature differences documented anywhere?
i think this issue is one in the same with arbitrary code execution in the sense that you're just executing groovy to access DSL code written in Java. there isn't any caching for gremlin-language as compared with gremlin-groovy but i'm not sure that's really a feature, but a necessity for the groovy form to actually perform with any reasonable speed expectations. i dont think there is anything documented at this point.
Ok, thanks for your time here
do you know what you plan to do next for ArcadeDB and your use case?
I can't speak for ArcadeDB, I had submitted a pull request which we merged but then I rolled back once I realized only the groovy interpreter worked. The arcadedb usage of the script engines are an orthogonal configuration. I'm speaking more w/ the lead on the arcadedb project on the PR thread, but I think in the end the underlying gremlin script engine is expected to have feature parity, and be mostly transparent to the user.
On my own use case, the desire is to simplify the client endpoint, so that read-only queries can be written in the DSL and get executed directly on the server against the embedded graph, where the DSL's TraversalSource is bound to
g
. This eliminates quite a bit of complexity IMO, since the DSL & the traversal definition stays only on the server. In my own use case there is no HTTP server, but a streaming GRPC protocol between.
Since, as you said earlier, the DSL is just mapping to tinkerpop traversal methods my expectation was that this would just work, but I had only ever tested previously on the groovy script engine. If instead it was possible to bytecode compile the DSL query string directly in Java somehow, that would also suffice, although it's still not as clean as just sending the DSL query into the graph.