Efficient degree computation for traversals of big graphs
Hello,
We're trying to use Neptune with gremlin for a fairly big (XX m nodes) graph and our queries usually have to filter out low degree vertices at some point in the query for both efficiency and product reasons. According to the profiler, this operation takes the big brunt of our computation time. At the same time, the computation of the degree is something that could be pre-computed on the database (we dont even need it to be 100% accurate, a "recent" snapshot computation would be good enough) which would significantly speed up the query.
Is there anyone here who has some trick up their sleeve for degree computation that would work well, either as an ad-hoc snippet for a query or as a nice way to precompute it on the graph?
The snippet we currently use:
.filter_(inE("reported_by").count().is_(gt(REPORTS_MINIMUM)))
Solution:Jump to solution
I would first mention that the
profile()
step in Gremlin is different than the Neptune Profile API. The latter is going to provide a great deal more info, including whether or not the entire query is being optimized by Neptune:
https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html
If you have 10s of millions of nodes, you could use Neptune Analytics to do the degree calculations. Then extract the degree properties from NA, delete the NA graph, and bulk load those values back into NDB. We're working to make this round-trip process more seamless. But it isn't too hard to automate in the current form....Gremlin profile API in Neptune - Amazon Neptune
The Gremlin profile feature in Neptune runs a specified traversal, collects metrics about the run, and produces a profile report.
3 Replies
Solution
I would first mention that the
profile()
step in Gremlin is different than the Neptune Profile API. The latter is going to provide a great deal more info, including whether or not the entire query is being optimized by Neptune:
https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html
If you have 10s of millions of nodes, you could use Neptune Analytics to do the degree calculations. Then extract the degree properties from NA, delete the NA graph, and bulk load those values back into NDB. We're working to make this round-trip process more seamless. But it isn't too hard to automate in the current form.Gremlin profile API in Neptune - Amazon Neptune
The Gremlin profile feature in Neptune runs a specified traversal, collects metrics about the run, and produces a profile report.
Degree mutate centrality algorithm - Neptune Analytics
The .degree.mutate centrality algorithm counts the number of incident edges of every node in the graph. This measure of how connected the node is can in turn indicate the node's importance and level of influence in the network. The .degree.mutate
>Neptune Analytics
oufie ouchie ouwie my aws bill
On a serious note: Thank you, I guess we'll go with precomputing it outside of the graph and then updating it in in our bulk loading process.