triggan Comments - Answer Overflow

Topics

triggan

ATApache TinkerPop

•Created by masterhugo on 2/10/2025 in #questions

Gremlin python trying to connect Neptune WS when is down

And to further clarify, you mean you actually stopped the cluster using the start/stop API (https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-stop-start.html) not that the Serverless instance had scaled to 1? (Just trying to get a clearer picture of what might be going on here).

39 replies

ATApache TinkerPop

•Created by masterhugo on 2/10/2025 in #questions

Gremlin python trying to connect Neptune WS when is down

Just for clarity, when you say "disabled" is the cluster in a Stopped state? or did you delete all instances from the cluster? Or is the cluster completely deleted? instance being rebooted?

39 replies

ATApache TinkerPop

•Created by Coldfire on 12/10/2024 in #questions

Parameterized edges creation in existing graph

You've stumbled upon a common gap in Gremlin... has() steps cannot currently take a traversal as an argument. It's listed as a roadmap item for a future TinkerPop 4.x release: https://github.com/apache/tinkerpop/blob/087b3070914123055d3e4ededc2550f12715a0b4/docs/src/dev/future/index.asciidoc#has-traversal

12 replies

ATApache TinkerPop

•Created by Alex on 12/11/2024 in #questions

How to create indexes by Label?

The data modeling looks a bit odd here. I'm not sure I would try to use any component of an edge ID as a filter. At that point, you're sort of attempting to use the edges to model some form of entity. This can be a bit of an anti-pattern. Edges are meant to represent relationships (actions, verbs) in a graph where nodes/vertices are meant to represent entities (nouns, things). If this is a common query pattern, you may want to look at further de-normalizing the data model and creating a labeled node of Tenant. Executing a query of g.V(<client_id>).repeat(both(<list_of_edge_labels>).simplePath()).times(2).path() should perform better than what you currently have.

12 replies

ATApache TinkerPop

•Created by Alex on 12/11/2024 in #questions

How to create indexes by Label?

just by calling g.V().limit(1) with concurrent calls on an r6g.2xlarge machine, the average time is 250ms

How may concurrent calls? An r6g.2xlarge instance has 8 vCPUs (and 16 available query execution threads). If you're issuing more than 16 requests in parallel, any additional concurrent requests will queue (an instance can queue up to 8000 requests). You can see this with the /gremlin/status API (of %gremlin_status Jupyter magic) with the number of executing queries and the number of "accepted" queries. If you need more concurrency, then you'll need to add more vCPUs (either by scaling up or scaling out read replicas).

But in the query mentioned, the bottleneck starts at the stage where it calls the last otherV() before path(). g.V().has(T.id, "client-id-uuid").bothE("has_profile", "has_affiliated", "has_controlling").has(T.id, containing("tenant-id-uuid")).otherV().path().unfold().dedup().elementMap().toList()

Makes sense as you're using a text predicate here (containing()). Neptune does not maintain a Full Text Search index. So any use of text predicates as containing(), startingWith(), endingWith() etc. will incur some form of range scan and also require dictionary materialization (we lose all of the benefits of data compression here as each value must be fetched from the dictionary to compare with the predicate value you've provided).

12 replies

ATApache TinkerPop

•Created by Alex on 12/11/2024 in #questions

How to create indexes by Label?

This is might be one of your issues:

has(T.id, containing("tenant-id-uuid"))

has(T.id, containing("tenant-id-uuid"))

Neptune does not have a Full Text Search index. Using any of the text predicates (i.e. containing(), startswith(), endswith, etc.) will require dedictionarifying the values for each of the solutions up to that portion of the query. If that is a common pattern, I might suggest using a different property so you can do just a has(key, value) filter.

12 replies

ATApache TinkerPop

•Created by Alex on 12/11/2024 in #questions

How to create indexes by Label?

If you have some more details on the query, I might be able to help determine the best way to (re)write that to take advantage of the current indexes. Or potentially rework your data model to better fit within the existing indexes.

12 replies

ATApache TinkerPop

•Created by Alex on 12/11/2024 in #questions

How to create indexes by Label?

Neptune's indexing structure can be explained here: https://docs.aws.amazon.com/neptune/latest/userguide/feature-overview-data-model.html

12 replies

ATApache TinkerPop

•Created by Alex on 12/11/2024 in #questions

How to create indexes by Label?

the AWS Neptune experts suggested that I create some indexes

Which experts were these? Neptune doesn't support the creation of indexes beyond the 3 native indexes that are created by default. There's a fourth optional index, but only needed for very specific use cases: https://docs.aws.amazon.com/neptune/latest/userguide/features-lab-mode.html#features-lab-mode-features-osgp-index

12 replies

ATApache TinkerPop

•Created by Coldfire on 12/10/2024 in #questions

Parameterized edges creation in existing graph

merge steps were released in 3.7.x

12 replies

ATApache TinkerPop

•Created by Coldfire on 12/10/2024 in #questions

Parameterized edges creation in existing graph

The merge steps are fairly new. So hard to say this isn't something people "usually do" as we're still deriving patterns on how best to use those steps.

12 replies

ATApache TinkerPop

•Created by Alex on 11/28/2024 in #questions

How to Work with Transactions with Gremlin Python

You may see write throughput exceed 120,000 in some cases. There are a number of dependencies that drive that. But that's the safe number to use when estimating load speed/rates.

11 replies

ATApache TinkerPop

•Created by Alex on 11/28/2024 in #questions

How to Work with Transactions with Gremlin Python

If you're looking to optimize for write throughput on Neptune, you want to consider the following: - For each write requests, attempt to batch 100-200 "object" into a single write request/query. An "object" would be any combination of a vertex, edge, or subsequent vertex/edge properties (vertex with 4 properties == 5 "objects"). - Use parallel write requests. If using Python, consider using multiprocessing to create separate processes. They can share a connection pool to Neptune if you so choose. The number of parallel processes should equal the number of query execution threads available on your Neptune writer instance (which is equal to 2x the number of vCPUs on whatever size instance you're using). If you follow those guidelines, you should get similar performance to what you would see with Neptune's bulk loader. Note that conditional writes will have overhead. If using mergeV(), you're unlikely to see the same write throughput as Neptune's bulk loader as the bulk loader is not doing conditional writes. Neptune's "top speed" for write throughput is going to be about 120,000 "objects" per second when writing vertex and vertex properties and about half of that when writing edges (due to vertex reference checks when creating an edge). These numbers can only be attained if using a x.12xlarge writer instance or larger. Smaller instances will scale linearly in terms of throughput.

11 replies

ATApache TinkerPop

•Created by Coldfire on 12/10/2024 in #questions

Parameterized edges creation in existing graph

The issue here is seeing duplicates when trying to find the matching pairs to create the edges. I have solution that maybe close, but this creates duplicate edges (one in each direction):

g.V().hasLabel('E').as('v1').
    V().hasLabel('E').as('v2').
    select('v1','v2').
    where('v1',neq('v2')).by(id).
    or(
        where('v1',eq('v2')).by('bName'),
        where('v1',eq('v2')).by('cName')
        ).
    constant([:]).
    merge([(T.label):'newEdge']).
    merge(select('v1').by(id).group().by(constant(from)).by(unfold())).
    merge(select('v2').by(id).group().by(constant(to)).by(unfold())).
    mergeE()

g.V().hasLabel('E').as('v1').
    V().hasLabel('E').as('v2').
    select('v1','v2').
    where('v1',neq('v2')).by(id).
    or(
        where('v1',eq('v2')).by('bName'),
        where('v1',eq('v2')).by('cName')
        ).
    constant([:]).
    merge([(T.label):'newEdge']).
    merge(select('v1').by(id).group().by(constant(from)).by(unfold())).
    merge(select('v2').by(id).group().by(constant(to)).by(unfold())).
    mergeE()

Note that this will not work in Gremlify, as this uses the merge() step that was introduced in 3.7.x. Though I tested this on Neptune and it works fine. It takes a bit of Gremlin "hackery" to create the map that you pass into the mergeE() step at the end.
Basically, this does a cartesian join of all vertices with label "E" to all other vertices of label "E" and then filters on pairs that have different IDs but the same property of "bName" or "cName". At that point, you end up with a list of maps of paired vertices. That then needs to be converted into the map format supported by mergeE(), which all of the merge() steps accomplish.

12 replies

ATApache TinkerPop

•Created by Coldfire on 12/10/2024 in #questions

Parameterized edges creation in existing graph

Nvm... I think I see it now. You're trying to connect vertices based on common properties. Does direction matter?

12 replies

ATApache TinkerPop

•Created by Coldfire on 12/10/2024 in #questions

Parameterized edges creation in existing graph

Are you trying to create a fully connected graph from all vertices with a label of E?

12 replies

ATApache TinkerPop

•Created by Alex on 11/28/2024 in #questions

How to Work with Transactions with Gremlin Python

I received this suggestion to use tranctions to try to have more performance than using query string, thats why I`m try to implement it and check the difference in performance.

Unsure where this is coming from. What sort of performance gain are you looking for?
If you're using Gremlin Server, what backing store are you using? TinkerGraph? If so, ensure you're using TinkerTransactionGraph.
There's more on how to use TinkerTransactionGraph for unit testing of transactions here: https://aws.amazon.com/blogs/database/unit-testing-apache-tinkerpop-transactions-from-tinkergraph-to-amazon-neptune/

11 replies

ATApache TinkerPop

•Created by Wolfgang Fahl on 10/16/2024 in #questions

pymogwai

Yes, referring to porting the Java implementation of TinkerGraph to other runtimes. Not totally sure of the issues involved in doing this. More likely an issue of prioritization. But having TinkerGraph native in a runtime would open the doors for a few things that you can only do with TinkerGraph in Java. For example, the use of subgraph() in Gremlin-Java returns a TinkerGraph object that you can then issue queries against. It's a common pattern when you may want to return a subgraph locally (as a cache) and run queries against a locally cached subgraph. Today, if you use subgraph() in a non-Java client, you get back different representations of the subgraph via the different serializers. Usually in GraphSON or some for of map/JSON.

6 replies

ATApache TinkerPop

•Created by Wolfgang Fahl on 10/16/2024 in #questions

pymogwai

I would agree that this is interesting. There have been a number of situations where having TinkerGraph available in other runtimes would be useful. I'm curious why you chose to implement something different versus looking to add TinkerGraph to gremlinpython.

6 replies

ATApache TinkerPop

•Created by Alex on 10/10/2024 in #questions

Neptune Cluster Balancing Configuration

Yes, so that is using websockets (although, if you're using Neptune, the connection string should start with wss as Neptune is SSL/TLS only).

5 replies