GraphSON mapper

Hi, I'm trying to ingest some data into AWS Neptune and due to its size I'm forced to use a bulk data importer https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 (unless there's a bulk-insert functionality straight from Gremlin - I couldn't find this). Looking at the GraphSON schema/docs I see there are some IDs on the edges that I am not sure how/if I need to generate. https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0 I'm doing this mapping in Python (but can be done in other languages if there's better support). Any recommendations/tips for this?
Solution:
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune
Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.
Jump to solution
8 Replies
spmallette
spmallette14mo ago
it appears we've missed this question on bulk loading on @neptune - anyone have any tips for this?
Solution
triggan
triggan14mo ago
Do you already have data in GraphSON format? Or do you just need to use a bulk importer? If the latter, Neptune has it's own bulk load feature: https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html
Using the Amazon Neptune Bulk Loader to Ingest Data - Amazon Neptune
Overview of how to load data from external files into a Neptune DB instance using the Neptune bulk loader.
Dragos Ciupureanu
Dragos CiupureanuOP14mo ago
Thanks for the suggestion. Yes, in the end I used the bulk loader as it's easy to export from Pandas to the Gremlin CSV format. 👍
ManabuBeach
ManabuBeach14mo ago
Do we need to split up into edgesosns and vetexons right?
triggan
triggan14mo ago
@Dragos Ciupureanu - Neptune also has Pandas support through the AWS SDK for Pandas: https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/033%20-%20Amazon%20Neptune.ipynb
GitHub
aws-sdk-pandas/tutorials/033 - Amazon Neptune.ipynb at main · aws/a...
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (...
Dragos Ciupureanu
Dragos CiupureanuOP14mo ago
Nice, didn't know about this. Thanks @triggan Whilst on the same neptune topic, do you happen to know if I can get graph embeddings out of the graph? I see they use RotatE for link predictions but I just want the embeddings. From what I looked I couldn't find anything in their examples. Similarly, does a gremlin response hook up into something that can return embeddings for a subgraph?
triggan
triggan14mo ago
Neptune doesn't provide embeddings directly. Best I can say, for now, is "what this space" as there is a lot happening around this at the moment. At present, you would use Gremlin to fetch the subgraphs and then feed this into a separate library or model to generate the embeddings. Two popular ones used for this in the graph arena tend to be GraphSAGE and GraphStorm (https://graphstorm.readthedocs.io/en/latest/tutorials/quick-start.html#generating-embedding).
Dragos Ciupureanu
Dragos CiupureanuOP14mo ago
That's very useful, thanks @triggan
Want results from more Discord servers?
Add your server