Apache TinkerPop

AT

Apache TinkerPop

Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.

Join

Preventing Janusgraph crash on timeout

According to this: https://stackoverflow.com/questions/61985018/janusgraph-image-stop-responding-after-query-timeout-how-to-prevent-it and my own findings, Janusgraph hard crashes on gremlin timeouts because of the default used berkeley storage handler. To circumvent this, I'm trying to end my query before the time limit is reached, which seems to work pretty well. But how can I access the executionTimeout value from a Strategy or Step? Afaik it's only available in the Context, which doesn't get passed down to Strategy and Step. Greetings...

Way to update static vertex

https://docs.janusgraph.org/schema/advschema/#static-vertices I read document about TTL vertex. And it said that vertex must be static to set TTL for it. But when set static vertex, i can't update that vertex. So can I just set TTL and update the vertex at the same time?...

Dotnet best practice: converting Vertex properties to Model

A very common task in Dotnet is to convert a stored entity into a Model class. How is this best accomplished in Gremlin.Net? In other words: what does "the magic step" look like in the snippet below? ...
Solution:
I think what your asking for goes beyond what the GLVs/drivers are capable of. Each GLV is meant to return a query response using common data types (either primitives or higher level Maps/Lists) into a native data type that is commonly supported in each programming language. There's no way to tell Gremlin to return a response into a custom class.
What you maybe looking for is a secondary layer using an Object-Graph-Mapper (OGM) to handle this. For .NET, you could look to use something like Gremlinq for this: https://github.com/ExRam/ExRam.Gremlinq...

What is the use of adding type information to e.g ElementMap<type> in Gremlin.Net?

Consider the query and output in the attached image: What TYPE could be placed inside the ElemementMap<TYPE> that has any real value? The return type of ElementMap<TYPE> is a dictionary, and setting e.g ElementMap<IDictionary<string,string>>() would return make dotnet try to fit all vertex properties against a IDictionary<string,string> type - which only would work if all property values of the vertex are of type string. To frame it differently: is there any scenario in which I do not want to write ElemementMap<object> or ElemementMap<dynamic>?...

How can I find property with a certain data type?

I have a situation where the same property has different type under the same label, kind of like the following: g.addV("Cookies").property(single, "howMany", "10").next() g.addV("Cookies").property(single, "howMany", 42).next() ...
Solution:
Which Graph Database are you using @ManabuBeach ? Currently Gremlin does not have any easy way to test the type of a property value built in to the language. It is something we have discussed adding in a future update. Unless the graph database backend you are using allows for lambdas/closures, you may have to do this mostly in the application.

Verifying the count of ingested vertex and edges after bulk loading in Janusgraph.

I have bulk loaded around 600k Vertices and 800k Edges into my janusgraph cluster backed with bigtable, I want to verify the number of vertex with a given label 'A' using gremlin query but I'm getting evaluation timeout error. The evaluation timeout is set to 5 minutes. Gremlin query used is = g.V().hasLabel('A').count() Can anyone help me on how I can verify the count of vertices and edges loaded into the graph?...
Solution:
I think @spmallette already provided a good answer, but I will add several more notes below. 1. Index usage with count operations. JanusGraph doesn't support adding indices based on label only as for now, but allows you to make an index based on label + property / properties. In case you have a common property on all your vertices of a specific type (let's say "id" property) then you could potentially create a mixed index with key "id" and contstrained to your label 'A'. In such case your query to count all vertices of a specific label would look like:...

Traversal is propagating to further edges?

I have node label A and B with edge between them ("Has") Also i have node B with edge to another node of type B ("Co-occur") When I am traversing from A to B, the query is retrieving the B to B edge as well ! although i haven't traverse to it! ...
Solution:
I'm not an expert on @gdotvee usage, but i believe that edge count you've encircled isn't necessarily the number of edges traversed, but the number returned to G.V() for displaying the "Graph View". In this screenshot I do g.V() which traverses no edges at all and yet it still returns 6 edges. Note that there is a little toolbar button there with three dots connected by a line. Hover over it for a tooltip with a description but basically it will turn off that feature and in my example, the edg...

How to load url data into Neptune?

I am trying to load a small dataset into Neptune and it seems to always error. I tried g.io("<file path>"). with(IO.reader, IO.graphson)....
Solution:
To add to what Kelvin is saying, Neptune can only access other instances that are within the same VPC. It cannot reach the Internet. If you need for it to access a public-facing URL, then you'll need to deploy a NAT gateway within the same VPC as Neptune and supply the proper routing within the VPC to allow Neptune to reach the Internet. This blog discusses it for the purpose of SPARQL Federated Queries being able to reach public SPARQL endpoints: https://aws.amazon.com/blogs/database/benefitting-from-sparql-1-1-federated-queries-with-amazon-neptune/ The concept also holds true from Gremlin io() queries that need to access a public URL....

Cannot access a stored value after fold

I think i cannot find the docs related to this behavior. I stored a value in a variable then tried to access it after an aggregation, but it seems not found anymore, what could be the reason? And what possible approach I can solve?...
Solution:
once you use a reducing-barrier step, like fold() or groupBy() you lose path history.
ReducingBarrierStep: All of the traversers prior to the step are processed by a reduce function and once all the previous traversers are processed, a single "reduced value" traverser is emitted to the next step. Note that the path history leading up to a reducing barrier step is destroyed given its many-to-one nature.
Once you are reduced to a single traverser from many traversers, that path sort of loses meaning. Often folks work around this by rewriting their traversal to use aggregate() or some other global side-effect to hold the data somewhere other than path history. https://tinkerpop.apache.org/docs/current/reference/#a-note-on-barrier-steps...

Are there alternative clients other than console?

Hi guys, would you recommend any alternative client to run gremlin queries that have a more user friendly UI?
Solution:
In the interest of transparency also, you can see a list of other providers (not all of them are gremlin clients) at https://tinkerpop.apache.org/community.html and https://github.com/JanusGraph/janusgraph respectively for Gremlin server and janusgraph. G. V() supports pretty much all major providers

Exporting current DB to JSON

Hey, We want to export the current DB to a JSON file. This is used for small scale copy of the DB that includes a small number of users that can be used for unit tests. This does not use any product (e.g. Neptune) but just plain Tinkerpop server/client. ...

Issues to execute gremlin queries with Java versions higher than 11

I'm trying to perform gremlin queries in a Java 17 project and I'm receiving this error message:
Caused by: javax.script.ScriptException: java.util.concurrent.ExecutionException: BUG! exception in phase 'semantic analysis' in source unit 'Script1.groovy' Unsupported class file major version 61
Caused by: javax.script.ScriptException: java.util.concurrent.ExecutionException: BUG! exception in phase 'semantic analysis' in source unit 'Script1.groovy' Unsupported class file major version 61
I'm using these dependencies in my project:...

repeat with times(1) causing timeout

I'm trying to run a subtraversal of my query inside a repeat step. For some reason, it keeps timing out even with the breaking condition of times(1). Query 1 (runtime: 85 ms) ```...

Agnostic client-side serialization of custom types

Hi, I've noticed a potential trend in G.V() of serialization issues for types that appear to be custom and therefore not deserializable from G.V()'s perspective. Where things get tricky is from G.V()'s perspective we need to somehow be able to connect to any database regardless of this type of customer specific situation. Is there a way to configure a fallback deserializing mechanism client-side (e.g. GraphBinary config or otherwise) that automatically deserializes a result as string (or some other default) in the absence of an appropriate serializer being available?...
Solution:
Okay so final update on this, hopefully. I'm coming to the realisation that this is just not straight forward in any sense, even with some sort of fallback mechanism there would still be the issue of determining which/how many bytes should be read in the buffer for the incoming type, which i guess would be nearly impossible without actual knowledge of the structure expected.

How to investigate latency

I'm using JanusGraph as my db. When I issue the query via gremlin-console with .profile() step at the end, it finishes execution within 5–8 milliseconds, but when I perform the same request in a parametrized from an application, it takes up 25–40 milliseconds for the database to answer. What takes this time, and how can I minimize it? I have a feeling that most of the time is taken by the backend storage layer (Apache Cassandra), and this time is not shown in the .profile(). Am I correct?...
Solution:
So just switching to a static .repeat() traversal with .map{myArray[it.loops]} saved my day.

store edges of a node in a sorted manner

Is there any way for me to store the edges of a specific node in a sorted manner? Is there any Gremlin based database that implements indices perhaps?...
Solution:
Is there any way for me to store the edges of a specific node in a sorted manner?
to my knowledge none of the common ones will allow that. i dont think most even preserve insertion order for edges.
Is there any Gremlin based database that implements indices perhaps?
i assume you are still referring to edges here. Titan-like graphs such as JanusGraph, HugeGraph, etc. and DataStax Graph allow you to explicitly define indices on edges...

Coalesce steps causing concurrency issues

I’ve been having some issues with concurrnt modification errors on AWS Neptune. I’ve tried lowering the reserve concurrency but, the issue persists. Based on https://stackoverflow.com/questions/69932798/concurrentmodificationexception-in-amazon-neptune-using-gremlin-javascript-langu I was wondering if I could split up my use-case into two queries the first a g.V().has({my unique property})… and if it’s null only then do I do the mutation step. I’m kind of confused why this would make a difference though. This is my use-case (gremlinStatistics is how I imported __ for anonymous traversals, I’ll be changing that haha) ...
Solution:
In general the CME is a retryable exception, and in a highly concurrent environment where multiple client threads are mutating the graph at more or less the same time they are likely to happen and should be expected / coded for. That said, as much as possible having each client thread be touching different parts of the graph can reduce the likelihood of lock contentions.

Big graph makes timeouts

I am having trouble querying big graph especially when it comes to apply filters. I want to order the nodes so that I can take highest degree ones, but the graph is always throwing timeouts, and the only trick i am applying is pre-limiting the accessed nodes ``` g.V()...
Solution:
A few things here - 1. Neptune was originally designed as a database more in the mindset of TinkerPop OLTP, where queries that perform best have a constrained set of starting conditions with limited query frontier (the projected number of possible objects that may need to be assessed during query computation). Queries that traverse < 1M objects in the graph will perform with ~100ms of latency. Queries that need to process more that that will have a latency that scales linearly with query frontier. 2. For the most part, Gremlin queries are executed single-threadedly inside of Neptune. Each Neptune instance has a number of query execution threads equal to 2x the number of vCPUs on that instance. More on the resource allocation here: https://docs.aws.amazon.com/neptune/latest/userguide/instance-types.html 3. The Graviton 2 processors ( the "g" noted in the instance type ) are great for smaller OLTP queries and will show a better performance than the Intel processors for those queries. It has been noted in other forums (https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd), however, that the Graviton 2 processors have a TLB that is less performant than same generation Intel processors, making memory-intensive processing (slightly) less performant. So if you plan on running queries with a larger query frontier, using the Intel processors will show some gains (vice versa with smaller queries)....

Generated DSL related

I am new to gremlin and I am struggling with a problem with custom dsl. Followed with example I can create a socialTraversalDSL to provide user to use, something like g.person("one").knows("another") it's good, but how to change the Traversal type when the given step is changed. e.g . another TwitterTraversalDSL.and user can do something like g.person("one").twitter().follows("another") when follows step is not a part of socialTraversalDSL ..How can I achieve this with java ~ looking f...
Solution:
Welcome to TinkerPop! In answer to your question, I'm afraid that the DSL annotation model isn't flexible enough to handle something like that. I believe that DSLs can inherit one another, like, you could extend TwitterTraversalDSL from SocialTraversalDSL which would extend from GraphTraversal (i've not done that myself, but i think others have had success with that). But I don't think you can have two DSLs easily know and reference one another. That might even be problematic if you were to writ...

Collect and filter data

I have a complicated query that yields certain vertices, on which I later call .project. It looks somehow like this: complexQuery.project("a", "b", "c", ...).by(queryA).by(queryB).by(queryC).by(...) What I want to do is to later filter the results by some predicates over a, b, c, .... For example, maybe I want to select on those results, where b > 10. And that's when the question arrive: I could add filtering steps after the project, but is this any efficient? I'm just unsure about how gremlin evaluates it: does gremlin at first fully evaluates project and then goes to the following steps, or is the computation of fields a, b, c, ... delayed until they are being filtered on?...
Solution:
I'd say that the best general advice is for you to write your filters as early as possible to rid Gremlin of as many paths as possible. The only real way to tell what sort of optimizations you are getting though is to test/profile your queries on the graph you are using. TinkerPop doesn't dictate how graphs optimize Gremlin queries so what might work really fast on one may not be best on another.