Parameterized edges creation in existing graph
Hi :gremlin_smile: , I'm currently experimenting with Janusgraph. My graph is a directed hierachical graph coming straight from parsing an XML file. After this first bulk load, I want to add multiple new edges between vertices to create shortcuts or remove property duplication.
This was easily done using Cypher and a double MATCH but struggle to do the same thing in Gremlin.
I created a small dataset in Gremlify https://gremlify.com/jf036ue70jj/4
I tried project(), local(), combinations of select() and map() but I think I'm missing something really basic.
Thanks :furnace:
7 Replies
Are you trying to create a fully connected graph from all vertices with a label of
E
?
Nvm... I think I see it now. You're trying to connect vertices based on common properties. Does direction matter?Yes indeed I want to connect vertices based on common properties. As I said my graph is just a big XML document (so there is only parent/child edges) and I want to use Gremlin as my ‘index’ engine . Direction does not really matter but I don’t think it should be a blocking point ?
The issue here is seeing duplicates when trying to find the matching pairs to create the edges. I have solution that maybe close, but this creates duplicate edges (one in each direction):
Note that this will not work in Gremlify, as this uses the
Basically, this does a cartesian join of all vertices with label "E" to all other vertices of label "E" and then filters on pairs that have different IDs but the same property of "bName" or "cName". At that point, you end up with a list of maps of paired vertices. That then needs to be converted into the map format supported by
merge()
step that was introduced in 3.7.x. Though I tested this on Neptune and it works fine. It takes a bit of Gremlin "hackery" to create the map that you pass into the mergeE()
step at the end.Basically, this does a cartesian join of all vertices with label "E" to all other vertices of label "E" and then filters on pairs that have different IDs but the same property of "bName" or "cName". At that point, you end up with a list of maps of paired vertices. That then needs to be converted into the map format supported by
mergeE()
, which all of the merge()
steps accomplish.Thanks ! I will try this on Janusgraph. From the complexity of the script I understand it is not something people usually do with Gremlin ?
Using a client API I guess it would be simpler by creating multiple queries in a loop but I don’t find that efficient.
The merge steps are fairly new. So hard to say this isn't something people "usually do" as we're still deriving patterns on how best to use those steps.
merge steps were released in 3.7.x
I tried to work with the previous script but somehow it doesn't work, I assume it has to do with the fact properties do not have the same name between vertices label.
I created a simpler Gremlify with data that looks more like my project data https://gremlify.com/mgvxldbmmz (I need to connect all 'Connection' to corresponding 'Interface' in this example)
In the second query tab, I exposed something I don't understand about select() step scope in a projection by() step.
Maybe it's about query optimization but it seems strange it does not use the same 'con' vertex when used inside has() step.
You've stumbled upon a common gap in Gremlin...
has()
steps cannot currently take a traversal as an argument. It's listed as a roadmap item for a future TinkerPop 4.x release: https://github.com/apache/tinkerpop/blob/087b3070914123055d3e4ededc2550f12715a0b4/docs/src/dev/future/index.asciidoc#has-traversalGitHub
tinkerpop/docs/src/dev/future/index.asciidoc at 087b3070914123055d3...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.