Efficient degree computation for traversals of big graphs

Hello, We're trying to use Neptune with gremlin for a fairly big (XX m nodes) graph and our queries usually have to filter out low degree vertices at some point in the query for both efficiency and product reasons. According to the profiler, this operation takes the big brunt of our computation time. At the same time, the computation of the degree is something that could be pre-computed on the database (we dont even need it to be 100% accurate, a "recent" snapshot computation would be good enough) which would significantly speed up the query. Is there anyone here who has some trick up their sleeve for degree computation that would work well, either as an ad-hoc snippet for a query or as a nice way to precompute it on the graph? The snippet we currently use: .filter_(inE("reported_by").count().is_(gt(REPORTS_MINIMUM)))
A sample profiler output:
{'dur': 2746614501393,
'metrics': [{'id': '0.0.0()',
'name': 'DFEStep(VERTEX)',
'dur': 82175236840,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 2.9918736975401243},
'metrics': []},
{'id': '2.0.0()',
'name': 'NeptuneTraverserConverterDFEStep',
'dur': 29740083715,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 1.0827906027553822},
'metrics': []},
{'id': '4.0.0()',
'name': 'TraversalFilterStep([VertexStep(IN,edge), RangeGlobalStep(0,16), CountGlobalStep, IsStep(gt(15))])',
'dur': 2590977157580,
'counts': {'traverserCount': 13119538, 'elementCount': 13119538},
'annotations': {'percentDur': 94.33348423180372},
'metrics': [{'id': '0.1.0(4.0.0())',
'name': 'VertexStep(IN,edge)',
'dur': 1667910212515,
'counts': {'traverserCount': 234003142, 'elementCount': 234003142},
'annotations': {},
'metrics': []},
{'id': '2.1.0(4.0.0())',
'name': 'RangeGlobalStep(0,16)',
'dur': 192763622373,
'counts': {'traverserCount': 220936072, 'elementCount': 220936072},
'annotations': {},
'metrics': []},
{'id': '4.1.0(4.0.0())',
'name': 'CountGlobalStep',
'dur': 123685904012,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {},
'metrics': []},
{'id': '6.1.0(4.0.0())',
'name': 'IsStep(gt(15))',
'dur': 30358835733,
'counts': {},
'annotations': {},
'metrics': []}]},
{'id': '6.0.0()',
'name': 'NeptuneGroupCountStep',
'dur': 43722023258,
'counts': {'traverserCount': 1, 'elementCount': 1},
'annotations': {'percentDur': 1.5918514679007743},
'metrics': []}]}
A sample profiler output:
{'dur': 2746614501393,
'metrics': [{'id': '0.0.0()',
'name': 'DFEStep(VERTEX)',
'dur': 82175236840,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 2.9918736975401243},
'metrics': []},
{'id': '2.0.0()',
'name': 'NeptuneTraverserConverterDFEStep',
'dur': 29740083715,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 1.0827906027553822},
'metrics': []},
{'id': '4.0.0()',
'name': 'TraversalFilterStep([VertexStep(IN,edge), RangeGlobalStep(0,16), CountGlobalStep, IsStep(gt(15))])',
'dur': 2590977157580,
'counts': {'traverserCount': 13119538, 'elementCount': 13119538},
'annotations': {'percentDur': 94.33348423180372},
'metrics': [{'id': '0.1.0(4.0.0())',
'name': 'VertexStep(IN,edge)',
'dur': 1667910212515,
'counts': {'traverserCount': 234003142, 'elementCount': 234003142},
'annotations': {},
'metrics': []},
{'id': '2.1.0(4.0.0())',
'name': 'RangeGlobalStep(0,16)',
'dur': 192763622373,
'counts': {'traverserCount': 220936072, 'elementCount': 220936072},
'annotations': {},
'metrics': []},
{'id': '4.1.0(4.0.0())',
'name': 'CountGlobalStep',
'dur': 123685904012,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {},
'metrics': []},
{'id': '6.1.0(4.0.0())',
'name': 'IsStep(gt(15))',
'dur': 30358835733,
'counts': {},
'annotations': {},
'metrics': []}]},
{'id': '6.0.0()',
'name': 'NeptuneGroupCountStep',
'dur': 43722023258,
'counts': {'traverserCount': 1, 'elementCount': 1},
'annotations': {'percentDur': 1.5918514679007743},
'metrics': []}]}
Solution:
I would first mention that the profile() step in Gremlin is different than the Neptune Profile API. The latter is going to provide a great deal more info, including whether or not the entire query is being optimized by Neptune: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html If you have 10s of millions of nodes, you could use Neptune Analytics to do the degree calculations. Then extract the degree properties from NA, delete the NA graph, and bulk load those values back into NDB. We're working to make this round-trip process more seamless. But it isn't too hard to automate in the current form....
Gremlin profile API in Neptune - Amazon Neptune
The Gremlin profile feature in Neptune runs a specified traversal, collects metrics about the run, and produces a profile report.
Jump to solution
3 Replies
Solution
triggan
triggan3mo ago
I would first mention that the profile() step in Gremlin is different than the Neptune Profile API. The latter is going to provide a great deal more info, including whether or not the entire query is being optimized by Neptune: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html If you have 10s of millions of nodes, you could use Neptune Analytics to do the degree calculations. Then extract the degree properties from NA, delete the NA graph, and bulk load those values back into NDB. We're working to make this round-trip process more seamless. But it isn't too hard to automate in the current form.
Gremlin profile API in Neptune - Amazon Neptune
The Gremlin profile feature in Neptune runs a specified traversal, collects metrics about the run, and produces a profile report.
triggan
triggan3mo ago
Degree mutate centrality algorithm - Neptune Analytics
The .degree.mutate centrality algorithm counts the number of incident edges of every node in the graph. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network. The .degree.mutate
emi
emiOP3mo ago
>Neptune Analytics oufie ouchie ouwie my aws bill On a serious note: Thank you, I guess we'll go with precomputing it outside of the graph and then updating it in in our bulk loading process.
Want results from more Discord servers?
Add your server