Apache TinkerPop

Join the Apache TinkerPop server to ask questions!

Apache TinkerPop

Join the community to ask questions about Apache TinkerPop and get answers from other members.

8/8/2024

Optimizing connection between Python API (FastAPI) and Neptune

Hi guys. I've been working with gremlin python in my company for the past 4 years, using Neptune as the database. We are running a FastAPI server, where Neptune has been the main database since the beginning. We always have been struggling to get a good performance on the API, but recently it has become a more latent pain, with endpoints taking more than 10s to respond. We took some actions trying to improve this perfomance, such as updating the cluster to the latest engine version, and the same for FastAPI and gremlin-python dependencies. ...

Solution:

There's a lot to unpack here.... 1. We state in our docs that t4g.medium instances are really not great for production workloads. We support them for initial development so users can keep cost down, but the amount of resources available, and the fact that they are burstable instances, really constrains their usability. Once you've used up CPU credits, you're going to get throttled. 2. Neptune's concurrency model is based on instance size and the number of vCPUs per instance. For each vCPU, there are two query execution threads. So on a t4g.medium or an r6g.large instance, there are 2 vCPUs. That means that instance can only be computing 4 concurrent requests at a time. If you need more concurrency, then you should look to scale to a larger instance with more vCPUs. If you're workload varies over time, you may want to investigate using Neptune Serverless, which can automatically scale vertically to meet the needs of the application. There's a good presentation from last year's re:Invent that discusses when Serverless works best and when not to use it: https://youtu.be/xAdWa0Ahiok?si=OeSe-_L3ErcYH-XU...

ManabuBeach

8/6/2024

## Breadth-First Traversal Fact Check

Using the Neptune Query Profiling, I have found out that Gremlin queries seems to use depth first strategy to search things and as a result it tends to be both time and resource intensive especially when what I am looking for is a node just a 1 or 2 levels below. To do a Breadth-First Traversal the following approach has been suggested, but not sure if this really does the trick. If my goal is to find nearest nodes quickly, what could be efficient approaches?...

Solution:

Neptune uses BFS as the default traversal strategy. You can change the method in which a repeat() is executed via the query hint as noted here: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-query-hints-repeatMode.html

b4lls4ck

8/1/2024

How to find the edges of a node that have a weight of x or greater?

I have a graph with nodes that are connected to one another, and I have weights on the edges that connect these nodes. On this query g.V().has("person", "name", "A").out("is friends with").values("name").to_list() returns the names of people that person "A" has the relation "is friends with". But I would like to filter by the weight value. Person A has a weight of 0.9 with Person B, and a weight of 0.7 with Person C. I would like to only get back the people person A has an edge with a weight of greater than 0.8, how can I do that? I am using gremlin-python. I have tried g.V().has("person", "name", "A").out("is friends with").has("weight", gte(0.8)) but I get an error saying NameError: name "gte" is not defined. Did you mean: 'g'?...

Solution:

g.V().has("person", "name", "A").out(... returns Vertices to get edges need to use outE() something like a_friends = g.V().has("person", "name", "A").outE("is friends with").has("weight", P.gt(0.75)).inV().to_list()...

criminosis

7/27/2024

op_traversal P98 Spikes

Hi TinkerPop team! I'm observing these abrupt spikes in my gremlin server P98 metrics in my JanusGraph environment. I've been looking at the TraversalOpProcessor (https://github.com/apache/tinkerpop/blob/master/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/traversal/TraversalOpProcessor.java) code over the last couple days for some ideas of what could be causing it but I'm not seeing an obvious smoking gun so figured I'd ask around. The traversal in question is being submitted from Rust via Bytecode using gremlin-rs. Specifically with some additions I've made that added mergeV & mergeE among other things. It's in a PR awaiting the maintainer's review, but that's not really relevant to my question, but over here if you want to see it: https://github.com/wolf4ood/gremlin-rs/pull/214. The GraphSONV3 serializer is what's being used by the library....

Solution:

For anyone else that finds this thread the things I ended up finding to be issues: - Cassandra's disk throughput I/O (EBS gp3 is 125MB/s by default, at least for my use case a I was periodically maxxing that out, increasing to 250MB/s resolved that apparent bottleneck). So if long sustained writing occured the 125MB/s was not sufficent. - Optimizing traversals to using mergeE/mergeV that were either older groovy-script based evaluations I was submitting or older fold().coalesce(unfold(),...) style "get or create" based vertex mutations....

spmallette

7/12/2024

neo4j-gremlin not working with JDK17

Neo4jGraph fails to start with top-level error like:

Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory

with causes like:

Suppressed: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.scheduler.CentralJobScheduler@5906ebfb' failed to transition from stopped to shutting_down. Please see the attached cause exception "Exception java.lang.LinkageError: Could not get Throwable message field [in thread "main"]"

and:...

Solution:

This issue relates back to: https://tinkerpop.apache.org/docs/current/upgrade/#_building_and_running_with_jdk_17 and the need to add this JVM option:

--add-opens=java.base/sun.nio.ch=ALL-UNNAMED...

b4lls4ck

7/9/2024

I am unsure on how to use Python to add graphs to JanusGraph

I am having a difficult time using python-gremlin. I am really unsure as to how I can create a graph, create vertices and edges, and then upload it to the database. Could someone provide a guide on how to do these things? I followed the JanusGraph tutorial as well as the Tinkerpop tutorials on how to use gremlin-python but nothing seems to be working for me.

Solution:

the key bit to understand with gremlin-python is that you can only use it to query/mutate a graph with the Gremlin language and that graph must be hosted in Gremlin Server (or be compliant with its protocol, like Amazon Neptune). other functions like, "create a graph" or access provider specific functions like creating indices are not possible with the Gremln language and therefore not possible iwth gremlin-python (or other non-JVM programming languages that support Gremln). if you are wholly new to TinkerPop and JanusGraph (and maybe graph databases themselves), my recommendation would be to not start with JanusGraph. It's natural to want to dive right in to the graph you want to use, but I think it's better to take a slower approach. i think the learning process is like:...

Limosin18

7/3/2024

Optimising python-gremlin for fastApi

PS: Please give me some rope here as I am new to gremlin-python! I have a fastAPI which is serving some Rest API calls based on the data that I am fetching from a Neptune instance. I am using gremlin-python library to make the queries to graph instance. Some points to note:...

Lyndon

6/25/2024

C# Profiling Duration

Does anyone have an example of how to get the millisecond/microsecond execution time of a query in C#? I can't find any examples of this online.

Solution:

I don't have really good alternative. Option is to run same traversal from Gremlin Console or java GLV. If you have GremlinGroovyScriptEngine (it's default for TinkerPop 3.6-3.7) then workaround is to convert response to string or Map before sending back from server, something like await client.SubmitWithSingleResultAsync<Object>("g.V().profile().toList().toString()");...

Aiman

6/19/2024

Expecting java.lang.ArrayList/java.lang.List instead of java.lang.String. Where am I going wrong?

``` gremlin> :remote connect tinkerpop.server conf/remote.yaml session Jun 19, 2024 6:04:36 PM org.yaml.snakeyaml.internal.Logger warn WARNING: Failed to find field for org.apache.tinkerpop.gremlin.driver.Settings.serializers 18:04:37 INFO org.apache.tinkerpop.gremlin.driver.Connection.<init> - Created new connection for ws://localhost:8182/gremlin...

Solution:

A few things you should know based on your needs: 1. with JanusGraph you can't store Map or List so that may be a blocker for you 2. multi-properties aren't List - they are multiple values stored under the same property key, so those semantics may be a blocker for you 3. Gremlin has no steps that can help you detect a type. Whatever type detection you would do, it would have to happen in the client application (technically what you are doing when you call next() and class) which may be a blocker for you. 4. Using just Gremlin It's hard to detect if you are using multiproperties or not because it has no knowledge of the schema. Maybe you could consult JanusGraph for the schema for the types. Counting properties of a key doesn't really work either because you could have a key with just one property and it still could be a multiproperty (with just one value)....

Arthur from gdotv

6/18/2024

Is NoHostAvailableException losing/not including relevant error context (3.7.0 above)?

Hey folks, I've recently noticed that G.V() is "not as good" as it used to be in reporting some specific connectivity issue, and upon further investigation managed to attribute this to a change in behaviour with NoHostAvailableException seemingly losing some relevant context. In my case, I'm testing the scenario of trying to connect to Azure Cosmos DB from an IP that is not allowed through their firewall, which usually results in a bespoke error message for Azure going as follows:

Invalid handshake response getStatus: 403 Request originated from IP 213.31.211.185 through public internet. This is blocked by your Cosmos DB account firewall settings. More info: https://aka.ms/cosmosdb-tsg-forbidden

Invalid handshake response getStatus: 403 Request originated from IP 213.31.211.185 through public internet. This is blocked by your Cosmos DB account firewall settings. More info: https://aka.ms/cosmosdb-tsg-forbidden

...

Solution:

i dont think we've changed any behavior for NoHostAvailableException since 3.5.5: https://tinkerpop.apache.org/docs/current/upgrade/#_gremlin_driver_host_availability Since that time there is really only one way that an NHA is thrown: if the connection pool cannot initialize a connection to any host. we are selective in what exceptions are raised within the NHA because there are cases where the exception can be more confusing than helpful. in this case, we weren't including handshake exception...

message.txt

qfel

6/15/2024

Setting index in gremlin-python

I was trying to create vote graph from tutorial on loading data in gremlin-python and afaik you can't simply add index from non-JVM languages because for example there is no TinkerGraph that you could .open(). I don't know how better is performance when having index on 'userId' but my code simply takes too long go through queries from vote file. I tried using client functionality ```py ws_url = 'ws://localhost:8182/gremlin'...

Solution:

if you simply edit that line of code to create your index and load your vote data, every time you start Gremlin Server it will have that all setup and ready to go.

Lyndon

6/14/2024

Anyone had issues with gremlinpython driver async_timeout?

This seems to be an undocumented and unrequired (but actually required) library, every time i go to use the python driver in a cloud computer that is fresh, I have to install gremlinpython then async_timeout. This makes sense because it's not a default python library, but seems to be for some reason on the github machiens tinkerpop uses to test. I am wondering if anyone else has noticed this and if we should perhaps put it in the https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/python/setup.py or something....

Solution:

Well looks like its just me so i wont dig into fixing this.

spc16670

6/12/2024

Dynamically calculated values from last vertex to select outgoing edges for further traversal

Hi - I asked the question on SO (https://stackoverflow.com/questions/78611365/neptune-graph-traversal-that-uses-dynamically-calculated-values-from-last-vertex) but there is no answer. I am really interested if this is a scenario gremlin can handle. I am not very well versed with Tinkerpop so I am not sure whether I am just trying to build a query that is complex, or asking for a case that is just not supported in tinkerpop. I work for a company that is seriously considering onboarding onto greml...

6/10/2024

Window functions in gremlin

Is there any way to apply window functions in gremlin queries I need to convert the following SQL query into gremlin.

SELECT ass.id, fin.id, ...

Johan

6/1/2024

Does the TinkerGraph in-memory database support List cardinality properties for vertices?

It is my understanding that the following code should work: ```java import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource; import org.apache.tinkerpop.gremlin.structure.Vertex;...

Solution:

elementMap() assumes that cardinality for each key is single and if it is list then only the first item encountered will be returned. To get all property values valueMap() step can be used instead. `gremlin> g.V(1).property("address", "a1").property(list, "address", "a2") ==>v[1] gremlin> g.V(1).valueMap()...

RuS2m

6/1/2024

Analyzing samples of Gremlin Queries in Neptune Notebook

Hey everyone, I’m working on a project where we give internal customers access to our Neptune graph through Neptune Notebook. There are already quite a few users, and we want to analyze the queries they run to see which parts of our ontology are used more and which parts are less utilized. This is not as straight-forward as retrieving all labels from the query, since our edge labels are not unique, and if people would be using .in or .out steps without clarifying the entity name, it's almost impossible to analyze which part of ontology was visited. We also want to identify common query patterns to understand what people are usually querying for and which connections in our ontology are the most frequently used, but also filtering out some common to all queries parts, like g.V() or g.V(), retrieving rather information about combinations of multiple steps that were called. We’ve figured out how to override the Gremlin magic in Neptune Notebook to add our custom logic to handle each query. And for my problem I’m considering two approaches:...

Solution:

I think this is going to depend on how granular you want to get. If the intent is to see what labeled vertices or edges are accessed, then just looking at a query in the audit log would be sufficient. But, if your intent is to see every atomic component that is accessed in the database as part of query execution, that could be expensive. It is possible, though. You could run every query through the Neptune Gremlin Profiler: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html and set profile.indexOps to True and you'll get an output at the bottom of the profile output with every index operation that occurs. These will equate to some permutation of S-P-O-G patterns that are used in the three different built-in indexes (or fourth index, if enabled).
With the list of indexed lookup patterns, you could possibly maintain an external counter (maybe in sorted set in Redis/Valkey) with a a key of the S-P-O-G combination and the value being the number of times accessed. Just be aware that attaining a Neptune Gremlin Profile output requires that you run the query again. So you may not be able to use this to capture writes (without rewriting the data) and it will incur additional database resources to re-run all of the read queries....

Lyndon

5/29/2024

Potential bug in evaluationTimeout when using auth?

When i use g.with("evaluationTimeout", X).call(<some call step).next() without authentication it works. When I do it with authentication enabled it does not work. The offending code appears to be this piece of TraversalOpProcessor.java ``` final long seto = args.containsKey(Tokens.ARGS_EVAL_TIMEOUT) ?...

Solution:

The problem appears to be in this block of code

porunov

5/18/2024

Should `barrier` step merge Edge properties with the same key and value?

I am trying to understand if it is expected for barrier to merge multiple traversers of different Edge properties together (a.k.a. optimization). Currently, as the result of such merging some Edge properties might be missing from the continuing traversal. For example, the following test will fail as the last line because a single property is still left after all "name" properties removal (graph.traversal().E().properties("name").barrier(5).drop().iterate()). I.e. I had impression that barrier step may influence query optimization, but not influence query result. Now I'm trying to understand if that is the intended behavior or not. ``` @Test public void testDropsEdgePropertiesTinkerGraph() { Graph graph = TinkerGraph.open();...

Solution:

barrier() doesn't dedup. it bulks. https://tinkerpop.apache.org/docs/current/reference/#barrier-step i think the problem here is that unique Edge properties bulk because of how equality works for them, where you can have two key/values that are the same but not refer to the same actual property. note that the same doesn't happen for vertex properties which have quality based on id: ```gremlin> g.addV().property('name','alice') ==>v[0] gremlin> g.addV().property('name','alice') ==>v[2]...

pm_osc

5/17/2024

Authorization with transaction results in error

Hi All, I have configured passive authorization as described in https://tinkerpop.apache.org/docs/3.7.0/reference/#authorization. All works fine, but once I use Gremlin console with session mode, calling :remote close, the following error happens:

java.util.concurrent.ExecutionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failed to authorize: This AuthorizationHandler only handles requests with OPS_BYTECODE or OPS_EVAL.

java.util.concurrent.ExecutionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failed to authorize: This AuthorizationHandler only handles requests with OPS_BYTECODE or OPS_EVAL.

And on the server side I see (full server log enclosed):...

Solution:

I've looked into your real use case a little closer and I don't think there's a workaround at this time. Side note, you should end your traversals with a terminating step like iterate() or else they don't do anything. So your query should actually be gtx.addV().property('name', 'test1').property('age', 11).iterate(). The error you are seeing with "This AuthorizationHandler..." occurs after the transaction attempts to commit so it shouldn't actually prevent the commit from occurring. The real p...

AllowListAuthorizer....

danielcraig23

5/9/2024

Is the insertion order guaranteed with this example code?

Taking the following code which is found at https://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript-transactions, is the insertion order guaranteed for these two new vertices? ```const g = traversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin')); const tx = g.tx(); // create a Transaction ...

Solution:

right, there are no any guaranties with Promise.all if for some reason the insertion order is important, then you need to call sequentially

gtx.addV("person").property("name", "jorge").iterate();
gtx.addV("person").property("name", "josh").iterate();

...

Previous Next

Gaming

Programming

Apache TinkerPop

Join the Apache TinkerPop server to ask questions!

Apache TinkerPop

Join the community to ask questions about Apache TinkerPop and get answers from other members.