Parameterized edges creation in existing graph

Hi :gremlin_smile: , I'm currently experimenting with Janusgraph. My graph is a directed hierachical graph coming straight from parsing an XML file. After this first bulk load, I want to add multiple new edges between vertices to create shortcuts or remove property duplication. This was easily done using Cypher and a double MATCH but struggle to do the same thing in Gremlin. I created a small dataset in Gremlify https://gremlify.com/jf036ue70jj/4
g.V()
.hasLabel('E')
.as('e')
.local(
// this first block alone returns only 6 pairs i,j
select('e').values('bName').as('i').
select('e').values('cName').as('j').
select('i', 'j')

// matching corresponding B->C vertices does not work (48 results)
.V()
.hasLabel('B').has('name', select('i')).as('b')
.out('has').has('name', select('j')).as('c')
//addE 'connected' from 'e' to 'c'
.select('e', 'b', 'c')
)
g.V()
.hasLabel('E')
.as('e')
.local(
// this first block alone returns only 6 pairs i,j
select('e').values('bName').as('i').
select('e').values('cName').as('j').
select('i', 'j')

// matching corresponding B->C vertices does not work (48 results)
.V()
.hasLabel('B').has('name', select('i')).as('b')
.out('has').has('name', select('j')).as('c')
//addE 'connected' from 'e' to 'c'
.select('e', 'b', 'c')
)
I tried project(), local(), combinations of select() and map() but I think I'm missing something really basic. Thanks :furnace:
7 Replies
triggan
triggan2mo ago
Are you trying to create a fully connected graph from all vertices with a label of E? Nvm... I think I see it now. You're trying to connect vertices based on common properties. Does direction matter?
Coldfire
ColdfireOP2mo ago
Yes indeed I want to connect vertices based on common properties. As I said my graph is just a big XML document (so there is only parent/child edges) and I want to use Gremlin as my ‘index’ engine . Direction does not really matter but I don’t think it should be a blocking point ?
triggan
triggan2mo ago
The issue here is seeing duplicates when trying to find the matching pairs to create the edges. I have solution that maybe close, but this creates duplicate edges (one in each direction):
g.V().hasLabel('E').as('v1').
V().hasLabel('E').as('v2').
select('v1','v2').
where('v1',neq('v2')).by(id).
or(
where('v1',eq('v2')).by('bName'),
where('v1',eq('v2')).by('cName')
).
constant([:]).
merge([(T.label):'newEdge']).
merge(select('v1').by(id).group().by(constant(from)).by(unfold())).
merge(select('v2').by(id).group().by(constant(to)).by(unfold())).
mergeE()
g.V().hasLabel('E').as('v1').
V().hasLabel('E').as('v2').
select('v1','v2').
where('v1',neq('v2')).by(id).
or(
where('v1',eq('v2')).by('bName'),
where('v1',eq('v2')).by('cName')
).
constant([:]).
merge([(T.label):'newEdge']).
merge(select('v1').by(id).group().by(constant(from)).by(unfold())).
merge(select('v2').by(id).group().by(constant(to)).by(unfold())).
mergeE()
Note that this will not work in Gremlify, as this uses the merge() step that was introduced in 3.7.x. Though I tested this on Neptune and it works fine. It takes a bit of Gremlin "hackery" to create the map that you pass into the mergeE() step at the end.
Basically, this does a cartesian join of all vertices with label "E" to all other vertices of label "E" and then filters on pairs that have different IDs but the same property of "bName" or "cName". At that point, you end up with a list of maps of paired vertices. That then needs to be converted into the map format supported by mergeE(), which all of the merge() steps accomplish.
Coldfire
ColdfireOP2mo ago
Thanks ! I will try this on Janusgraph. From the complexity of the script I understand it is not something people usually do with Gremlin ? Using a client API I guess it would be simpler by creating multiple queries in a loop but I don’t find that efficient.
triggan
triggan2mo ago
The merge steps are fairly new. So hard to say this isn't something people "usually do" as we're still deriving patterns on how best to use those steps. merge steps were released in 3.7.x
Coldfire
ColdfireOP2mo ago
I tried to work with the previous script but somehow it doesn't work, I assume it has to do with the fact properties do not have the same name between vertices label. I created a simpler Gremlify with data that looks more like my project data https://gremlify.com/mgvxldbmmz (I need to connect all 'Connection' to corresponding 'Interface' in this example) In the second query tab, I exposed something I don't understand about select() step scope in a projection by() step.
g
.V().hasLabel('connection').as('con')
.project('dev', 'int', 'connection', 'device')
.by('devName')
.by('intName')
.by(select('con'))
.by(
// if I select 'devName', results are correct
//select('con').values('devName')

// using a nested select always give the same 'device'
__.V().hasLabel('device').has('name', select('con').values('devName'))
)
g
.V().hasLabel('connection').as('con')
.project('dev', 'int', 'connection', 'device')
.by('devName')
.by('intName')
.by(select('con'))
.by(
// if I select 'devName', results are correct
//select('con').values('devName')

// using a nested select always give the same 'device'
__.V().hasLabel('device').has('name', select('con').values('devName'))
)
Maybe it's about query optimization but it seems strange it does not use the same 'con' vertex when used inside has() step.
triggan
triggan2mo ago
You've stumbled upon a common gap in Gremlin... has() steps cannot currently take a traversal as an argument. It's listed as a roadmap item for a future TinkerPop 4.x release: https://github.com/apache/tinkerpop/blob/087b3070914123055d3e4ededc2550f12715a0b4/docs/src/dev/future/index.asciidoc#has-traversal
GitHub
tinkerpop/docs/src/dev/future/index.asciidoc at 087b3070914123055d3...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.

Did you find this page helpful?