Is there a way to store the tinkerpop graph in DynamoDB?
AWS provides Neptue graph database but problem with it is that it is not distributed and can't be horizontally scaled like DGraph etc. So I was wondering as DynamodDB is distributed database, if there is a way to store tinkerpop graph in DynamoDB directly?
Solution:Jump to solution
TinkerPop, in general, can be designed to use nearly any back-end. You just need a storage plugin for it. To make it performant, it would also require overriding many of the underlying query execution operators to make sure they are fetching data from DynamoDB table(s) efficiently. TinkerGraph is a reference implementation of this where the storage medium is in-memory hashmaps for both vertices and edges. In practice, most people start with reviewing the code for TinkerGraph as a starting point for creating support for other storage mediums.
Once upon a time there was an implementation of TinkerPop called Titan (later became the basis of DSE Graph) that had a storage plugin that worked with DynamoDB. Someone later forked it and added support for such for JanusGraph (another TinkerPop implementation). The plugin is still out there, but hasn't been supported/maintained. https://github.com/amazon-archives/dynamodb-janusgraph-storage-backend
JanusGraph, itself, supports a Cassandra backend. We have seen a few folks attempt to use JanusGraph with Amazon Keyspaces (for Apache Cassandra). ...
GitHub
GitHub - amazon-archives/dynamodb-janusgraph-storage-backend: The A...
The Amazon DynamoDB Storage Backend for JanusGraph - GitHub - amazon-archives/dynamodb-janusgraph-storage-backend: The Amazon DynamoDB Storage Backend for JanusGraph
5 Replies
Can you say a bit more about the use case you have, the types of graph operations you need to perform, and the amount of data you expect to be working with?
To add to Kelvin's comment: Neptune does scale horizontally for reads via the use of read replicas. Write scaling (vertically) can be done dynamically using Neptune Serverless and a serverless writer instance. Working backwards from your requirements, I'm curious what kind of write throughput you require that leads you to believe you need some form of horizontal write scaling.
Solution
TinkerPop, in general, can be designed to use nearly any back-end. You just need a storage plugin for it. To make it performant, it would also require overriding many of the underlying query execution operators to make sure they are fetching data from DynamoDB table(s) efficiently. TinkerGraph is a reference implementation of this where the storage medium is in-memory hashmaps for both vertices and edges. In practice, most people start with reviewing the code for TinkerGraph as a starting point for creating support for other storage mediums.
Once upon a time there was an implementation of TinkerPop called Titan (later became the basis of DSE Graph) that had a storage plugin that worked with DynamoDB. Someone later forked it and added support for such for JanusGraph (another TinkerPop implementation). The plugin is still out there, but hasn't been supported/maintained. https://github.com/amazon-archives/dynamodb-janusgraph-storage-backend
JanusGraph, itself, supports a Cassandra backend. We have seen a few folks attempt to use JanusGraph with Amazon Keyspaces (for Apache Cassandra).
In either case, building scale-out support for all of this becomes a bit cumbersome. As you end up having to front the back-end datastore (DynamoDB or Cassandra) with multiple Gremlin Server containers/instances. And then it is likely you would want to front those with a load balancer to handle scaling/redundancy.
All boils down to tradeoffs (as most things).
In either case, building scale-out support for all of this becomes a bit cumbersome. As you end up having to front the back-end datastore (DynamoDB or Cassandra) with multiple Gremlin Server containers/instances. And then it is likely you would want to front those with a load balancer to handle scaling/redundancy.
All boils down to tradeoffs (as most things).
GitHub
GitHub - amazon-archives/dynamodb-janusgraph-storage-backend: The A...
The Amazon DynamoDB Storage Backend for JanusGraph - GitHub - amazon-archives/dynamodb-janusgraph-storage-backend: The Amazon DynamoDB Storage Backend for JanusGraph
Basically Neptune has hard limit of 128 (TiB) storage (https://docs.aws.amazon.com/neptune/latest/userguide/limits.html#limits-cluster-volume-size)
I want to store a graph of size much more larger than this and still all queries etc. should work efficiently. Do you think this would be possible using Neptune?
Amazon Neptune Limits - Amazon Neptune
The AWS Regions where Neptune is available and information about Neptune limits, including DB instance sizes and accounts.
It ultimately depends on the shape of the graph and the data types of properties stored. But in our testing, a full capacity 128TB volume could store a graph of between 200-400 billion atomic components (a vertex, an edge, or properties).