Hello,
We're trying to use Neptune with gremlin for a fairly big (XX m nodes) graph and our queries usually have to filter out low degree vertices at some point in the query for both efficiency and product reasons. According to the profiler, this operation takes the big brunt of our computation time. At the same time, the computation of the degree is something that could be pre-computed on the database (we dont even need it to be 100% accurate, a "recent" snapshot computation would be good enough) which would significantly speed up the query.
Is there anyone here who has some trick up their sleeve for degree computation that would work well, either as an ad-hoc snippet for a query or as a nice way to precompute it on the graph?
The snippet we currently use:
.filter_(inE("reported_by").count().is_(gt(REPORTS_MINIMUM)))
A sample profiler output:
{'dur': 2746614501393,
'metrics': [{'id': '0.0.0()',
'name': 'DFEStep(VERTEX)',
'dur': 82175236840,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 2.9918736975401243},
'metrics': []},
{'id': '2.0.0()',
'name': 'NeptuneTraverserConverterDFEStep',
'dur': 29740083715,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 1.0827906027553822},
'metrics': []},
{'id': '4.0.0()',
'name': 'TraversalFilterStep([VertexStep(IN,edge), RangeGlobalStep(0,16), CountGlobalStep, IsStep(gt(15))])',
'dur': 2590977157580,
'counts': {'traverserCount': 13119538, 'elementCount': 13119538},
'annotations': {'percentDur': 94.33348423180372},
'metrics': [{'id': '0.1.0(4.0.0())',
'name': 'VertexStep(IN,edge)',
'dur': 1667910212515,
'counts': {'traverserCount': 234003142, 'elementCount': 234003142},
'annotations': {},
'metrics': []},
{'id': '2.1.0(4.0.0())',
'name': 'RangeGlobalStep(0,16)',
'dur': 192763622373,
'counts': {'traverserCount': 220936072, 'elementCount': 220936072},
'annotations': {},
'metrics': []},
{'id': '4.1.0(4.0.0())',
'name': 'CountGlobalStep',
'dur': 123685904012,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {},
'metrics': []},
{'id': '6.1.0(4.0.0())',
'name': 'IsStep(gt(15))',
'dur': 30358835733,
'counts': {},
'annotations': {},
'metrics': []}]},
{'id': '6.0.0()',
'name': 'NeptuneGroupCountStep',
'dur': 43722023258,
'counts': {'traverserCount': 1, 'elementCount': 1},
'annotations': {'percentDur': 1.5918514679007743},
'metrics': []}]}
A sample profiler output:
{'dur': 2746614501393,
'metrics': [{'id': '0.0.0()',
'name': 'DFEStep(VERTEX)',
'dur': 82175236840,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 2.9918736975401243},
'metrics': []},
{'id': '2.0.0()',
'name': 'NeptuneTraverserConverterDFEStep',
'dur': 29740083715,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {'percentDur': 1.0827906027553822},
'metrics': []},
{'id': '4.0.0()',
'name': 'TraversalFilterStep([VertexStep(IN,edge), RangeGlobalStep(0,16), CountGlobalStep, IsStep(gt(15))])',
'dur': 2590977157580,
'counts': {'traverserCount': 13119538, 'elementCount': 13119538},
'annotations': {'percentDur': 94.33348423180372},
'metrics': [{'id': '0.1.0(4.0.0())',
'name': 'VertexStep(IN,edge)',
'dur': 1667910212515,
'counts': {'traverserCount': 234003142, 'elementCount': 234003142},
'annotations': {},
'metrics': []},
{'id': '2.1.0(4.0.0())',
'name': 'RangeGlobalStep(0,16)',
'dur': 192763622373,
'counts': {'traverserCount': 220936072, 'elementCount': 220936072},
'annotations': {},
'metrics': []},
{'id': '4.1.0(4.0.0())',
'name': 'CountGlobalStep',
'dur': 123685904012,
'counts': {'traverserCount': 16066616, 'elementCount': 16066616},
'annotations': {},
'metrics': []},
{'id': '6.1.0(4.0.0())',
'name': 'IsStep(gt(15))',
'dur': 30358835733,
'counts': {},
'annotations': {},
'metrics': []}]},
{'id': '6.0.0()',
'name': 'NeptuneGroupCountStep',
'dur': 43722023258,
'counts': {'traverserCount': 1, 'elementCount': 1},
'annotations': {'percentDur': 1.5918514679007743},
'metrics': []}]}