Apache TinkerPop

AT

Apache TinkerPop

Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.

Join

neo4j-gremlin not working with JDK17

Neo4jGraph fails to start with top-level error like:
Error starting org.neo4j.kernel.impl.factory.GraphDatabaseFacadeFactory
with causes like:
Suppressed: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.impl.scheduler.CentralJobScheduler@5906ebfb' failed to transition from stopped to shutting_down. Please see the attached cause exception "Exception java.lang.LinkageError: Could not get Throwable message field [in thread "main"]"
and:...
Solution:
This issue relates back to: https://tinkerpop.apache.org/docs/current/upgrade/#_building_and_running_with_jdk_17 and the need to add this JVM option:
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED...

I am unsure on how to use Python to add graphs to JanusGraph

I am having a difficult time using python-gremlin. I am really unsure as to how I can create a graph, create vertices and edges, and then upload it to the database. Could someone provide a guide on how to do these things? I followed the JanusGraph tutorial as well as the Tinkerpop tutorials on how to use gremlin-python but nothing seems to be working for me.
Solution:
the key bit to understand with gremlin-python is that you can only use it to query/mutate a graph with the Gremlin language and that graph must be hosted in Gremlin Server (or be compliant with its protocol, like Amazon Neptune). other functions like, "create a graph" or access provider specific functions like creating indices are not possible with the Gremln language and therefore not possible iwth gremlin-python (or other non-JVM programming languages that support Gremln). if you are wholly new to TinkerPop and JanusGraph (and maybe graph databases themselves), my recommendation would be to not start with JanusGraph. It's natural to want to dive right in to the graph you want to use, but I think it's better to take a slower approach. i think the learning process is like:...

Optimising python-gremlin for fastApi

PS: Please give me some rope here as I am new to gremlin-python! I have a fastAPI which is serving some Rest API calls based on the data that I am fetching from a Neptune instance. I am using gremlin-python library to make the queries to graph instance. Some points to note:...

C# Profiling Duration

Does anyone have an example of how to get the millisecond/microsecond execution time of a query in C#? I can't find any examples of this online.
Solution:
I don't have really good alternative. Option is to run same traversal from Gremlin Console or java GLV. If you have GremlinGroovyScriptEngine (it's default for TinkerPop 3.6-3.7) then workaround is to convert response to string or Map before sending back from server, something like await client.SubmitWithSingleResultAsync<Object>("g.V().profile().toList().toString()");...

Expecting java.lang.ArrayList/java.lang.List instead of java.lang.String. Where am I going wrong?

``` gremlin> :remote connect tinkerpop.server conf/remote.yaml session Jun 19, 2024 6:04:36 PM org.yaml.snakeyaml.internal.Logger warn WARNING: Failed to find field for org.apache.tinkerpop.gremlin.driver.Settings.serializers 18:04:37 INFO org.apache.tinkerpop.gremlin.driver.Connection.<init> - Created new connection for ws://localhost:8182/gremlin...
Solution:
A few things you should know based on your needs: 1. with JanusGraph you can't store Map or List so that may be a blocker for you 2. multi-properties aren't List - they are multiple values stored under the same property key, so those semantics may be a blocker for you 3. Gremlin has no steps that can help you detect a type. Whatever type detection you would do, it would have to happen in the client application (technically what you are doing when you call next() and class) which may be a blocker for you. 4. Using just Gremlin It's hard to detect if you are using multiproperties or not because it has no knowledge of the schema. Maybe you could consult JanusGraph for the schema for the types. Counting properties of a key doesn't really work either because you could have a key with just one property and it still could be a multiproperty (with just one value)....

Is NoHostAvailableException losing/not including relevant error context (3.7.0 above)?

Hey folks, I've recently noticed that G.V() is "not as good" as it used to be in reporting some specific connectivity issue, and upon further investigation managed to attribute this to a change in behaviour with NoHostAvailableException seemingly losing some relevant context. In my case, I'm testing the scenario of trying to connect to Azure Cosmos DB from an IP that is not allowed through their firewall, which usually results in a bespoke error message for Azure going as follows:
Invalid handshake response getStatus: 403 Request originated from IP 213.31.211.185 through public internet. This is blocked by your Cosmos DB account firewall settings. More info: https://aka.ms/cosmosdb-tsg-forbidden
Invalid handshake response getStatus: 403 Request originated from IP 213.31.211.185 through public internet. This is blocked by your Cosmos DB account firewall settings. More info: https://aka.ms/cosmosdb-tsg-forbidden
...
Solution:
i dont think we've changed any behavior for NoHostAvailableException since 3.5.5: https://tinkerpop.apache.org/docs/current/upgrade/#_gremlin_driver_host_availability Since that time there is really only one way that an NHA is thrown: if the connection pool cannot initialize a connection to any host. we are selective in what exceptions are raised within the NHA because there are cases where the exception can be more confusing than helpful. in this case, we weren't including handshake exception...

Setting index in gremlin-python

I was trying to create vote graph from tutorial on loading data in gremlin-python and afaik you can't simply add index from non-JVM languages because for example there is no TinkerGraph that you could .open(). I don't know how better is performance when having index on 'userId' but my code simply takes too long go through queries from vote file. I tried using client functionality ```py ws_url = 'ws://localhost:8182/gremlin'...
Solution:
if you simply edit that line of code to create your index and load your vote data, every time you start Gremlin Server it will have that all setup and ready to go.

Anyone had issues with gremlinpython driver async_timeout?

This seems to be an undocumented and unrequired (but actually required) library, every time i go to use the python driver in a cloud computer that is fresh, I have to install gremlinpython then async_timeout. This makes sense because it's not a default python library, but seems to be for some reason on the github machiens tinkerpop uses to test. I am wondering if anyone else has noticed this and if we should perhaps put it in the https://github.com/apache/tinkerpop/blob/master/gremlin-python/src/main/python/setup.py or something....
Solution:
Well looks like its just me so i wont dig into fixing this.

Dynamically calculated values from last vertex to select outgoing edges for further traversal

Hi - I asked the question on SO (https://stackoverflow.com/questions/78611365/neptune-graph-traversal-that-uses-dynamically-calculated-values-from-last-vertex) but there is no answer. I am really interested if this is a scenario gremlin can handle. I am not very well versed with Tinkerpop so I am not sure whether I am just trying to build a query that is complex, or asking for a case that is just not supported in tinkerpop. I work for a company that is seriously considering onboarding onto greml...

Window functions in gremlin

Is there any way to apply window functions in gremlin queries I need to convert the following SQL query into gremlin.
SELECT ass.id, fin.id, ...

Does the TinkerGraph in-memory database support List cardinality properties for vertices?

It is my understanding that the following code should work: ```java import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal; import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource; import org.apache.tinkerpop.gremlin.structure.Vertex;...
Solution:
elementMap() assumes that cardinality for each key is single and if it is list then only the first item encountered will be returned. To get all property values valueMap() step can be used instead. `gremlin> g.V(1).property("address", "a1").property(list, "address", "a2") ==>v[1] gremlin> g.V(1).valueMap()...

Analyzing samples of Gremlin Queries in Neptune Notebook

Hey everyone, I’m working on a project where we give internal customers access to our Neptune graph through Neptune Notebook. There are already quite a few users, and we want to analyze the queries they run to see which parts of our ontology are used more and which parts are less utilized. This is not as straight-forward as retrieving all labels from the query, since our edge labels are not unique, and if people would be using .in or .out steps without clarifying the entity name, it's almost impossible to analyze which part of ontology was visited. We also want to identify common query patterns to understand what people are usually querying for and which connections in our ontology are the most frequently used, but also filtering out some common to all queries parts, like g.V() or g.V(), retrieving rather information about combinations of multiple steps that were called. We’ve figured out how to override the Gremlin magic in Neptune Notebook to add our custom logic to handle each query. And for my problem I’m considering two approaches:...
Solution:
I think this is going to depend on how granular you want to get. If the intent is to see what labeled vertices or edges are accessed, then just looking at a query in the audit log would be sufficient. But, if your intent is to see every atomic component that is accessed in the database as part of query execution, that could be expensive. It is possible, though. You could run every query through the Neptune Gremlin Profiler: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-profile-api.html and set profile.indexOps to True and you'll get an output at the bottom of the profile output with every index operation that occurs. These will equate to some permutation of S-P-O-G patterns that are used in the three different built-in indexes (or fourth index, if enabled).
With the list of indexed lookup patterns, you could possibly maintain an external counter (maybe in sorted set in Redis/Valkey) with a a key of the S-P-O-G combination and the value being the number of times accessed. Just be aware that attaining a Neptune Gremlin Profile output requires that you run the query again. So you may not be able to use this to capture writes (without rewriting the data) and it will incur additional database resources to re-run all of the read queries....

Potential bug in evaluationTimeout when using auth?

When i use g.with("evaluationTimeout", X).call(<some call step).next() without authentication it works. When I do it with authentication enabled it does not work. The offending code appears to be this piece of TraversalOpProcessor.java ``` final long seto = args.containsKey(Tokens.ARGS_EVAL_TIMEOUT) ?...
Solution:
The problem appears to be in this block of code
No description

Should `barrier` step merge Edge properties with the same key and value?

I am trying to understand if it is expected for barrier to merge multiple traversers of different Edge properties together (a.k.a. optimization). Currently, as the result of such merging some Edge properties might be missing from the continuing traversal. For example, the following test will fail as the last line because a single property is still left after all "name" properties removal (graph.traversal().E().properties("name").barrier(5).drop().iterate()). I.e. I had impression that barrier step may influence query optimization, but not influence query result. Now I'm trying to understand if that is the intended behavior or not. ``` @Test public void testDropsEdgePropertiesTinkerGraph() { Graph graph = TinkerGraph.open();...
Solution:
barrier() doesn't dedup. it bulks. https://tinkerpop.apache.org/docs/current/reference/#barrier-step i think the problem here is that unique Edge properties bulk because of how equality works for them, where you can have two key/values that are the same but not refer to the same actual property. note that the same doesn't happen for vertex properties which have quality based on id: ```gremlin> g.addV().property('name','alice') ==>v[0] gremlin> g.addV().property('name','alice') ==>v[2]...

Authorization with transaction results in error

Hi All, I have configured passive authorization as described in https://tinkerpop.apache.org/docs/3.7.0/reference/#authorization. All works fine, but once I use Gremlin console with session mode, calling :remote close, the following error happens:
java.util.concurrent.ExecutionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failed to authorize: This AuthorizationHandler only handles requests with OPS_BYTECODE or OPS_EVAL.
java.util.concurrent.ExecutionException: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failed to authorize: This AuthorizationHandler only handles requests with OPS_BYTECODE or OPS_EVAL.
And on the server side I see (full server log enclosed):...
Solution:
I've looked into your real use case a little closer and I don't think there's a workaround at this time. Side note, you should end your traversals with a terminating step like iterate() or else they don't do anything. So your query should actually be gtx.addV().property('name', 'test1').property('age', 11).iterate(). The error you are seeing with "This AuthorizationHandler..." occurs after the transaction attempts to commit so it shouldn't actually prevent the commit from occurring. The real p...

Is the insertion order guaranteed with this example code?

Taking the following code which is found at https://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript-transactions, is the insertion order guaranteed for these two new vertices? ```const g = traversal().withRemote(new DriverRemoteConnection('ws://localhost:8182/gremlin')); const tx = g.tx(); // create a Transaction ...
Solution:
right, there are no any guaranties with Promise.all if for some reason the insertion order is important, then you need to call sequentially gtx.addV("person").property("name", "jorge").iterate(); gtx.addV("person").property("name", "josh").iterate();...

Using mergeE to create an edge with an id that depends on a lookup

I want to use mergeE to produce an edge whose id is the concatenation of the ids of its inV and outV vertices. But the inV vertex has to be looked up, the exact id is not known without a lookup. Suppose that partialMacbookId === "macbookAir" and the result of the lookup is the vertex with id "macbookAir2024" And suppose that ownerId === "1111"...
Solution:
i dont think there's any way to do that directly in Gremlin without (1) the new string steps in 3.7.x or (2) a lambda. That tends to leave folks with perhaps the third option, doing the operation with multiple queries in a transaction, where you do the concatenation client-side. I can't really be too specific but we hope to see Neptune working with 3.7.x soon.

Is tx.close() necessary in Javascript?

I have read the following two pieces of documentation and I have the question, is tx.close() necessary in Javascript?
https://tinkerpop.apache.org/docs/current/reference/#gremlin-javascript-transactions https://docs.aws.amazon.com/neptune/latest/userguide/access-graph-gremlin-transactions.html...
Solution:
not necessary, commit/rollback will also close the transaction

Using dedup with Neptune

I remember once i came accross AWS Neptune optimization guide that i don't remember where is it now. It mentions that .dedup() step is not optimized for Neptune which makes performance worse. However, I have the following scenario where i need deduplicates and pagination at same time....
Solution:
I guess what I'm getting at, is that I don't know of a way to make dedup() any more performant in that sort of query with Neptune's current implementation.
As far as pagination goes, have you tried using Neptune's Query Results Cache instead of making multiple range() calls? That would significantly decrease latency for subsequent calls as you paginate across the resuls: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-results-cache.html...

`next(n)` with Gremlin JavaScript

I'm trying to do some basic pagination next(n) seems perfect, but it doesn't appear to be available for JavaScript as per the documentation. Is there a reason for this limitation?...
Solution:
AFAIK, that is only possible via scripts.
No description