Apache TinkerPop

AT

Apache TinkerPop

Apache TinkerPop is an open source graph computing framework and the home of the Gremlin graph query language.

Join

dotnet `Enumeration has not started. Call MoveNext` if I try to enumerate over a result

I recently try to use gremlin to created a graph and query this graph. Currently I get it working to push data into the graph. πŸŽ‰ But now my problem is to get data back from the graph. What is working is something like that: ```csharp...
Solution:
You're on .NET 8, right? This is unfortunately a known issue which will be fixed in the next release: https://issues.apache.org/jira/browse/TINKERPOP-3029 I'm afraid you'll probably have to use .NET 7 until the next release...
No description

I can't create an edge in aws neptune using gremlin. I can create vertices. but not edge.

import { driver, process as gremlinProcess, structure } from "gremlin"; async function checkOut() { const DriverRemoteConnection = driver.DriverRemoteConnection; const Graph = structure.Graph;...

Iterating over responses

I've got a query akin to this in a python application using gremlin-python: ``` t = traversal().with_remote(DriverRemoteConnection(url, "g")) \ .V().has("some_vertex", "some_property", "foo").values()...
Solution:
it is the batch size to the client. it just doesn't wait for the client to tell it to send the next batch. the purpose of the batch was to control the rough size of each response, otherwise you could end up with a situation where the server might be serlializing too much in memory or sending responses that exceeded the max content length for a response

AWS Neptune updating gremlin driver to 3.6.2 introduced many bugs to working queries

After updating Amazon Neptune engine version from 1.2.0.2 to 1.2.1.0 and the Gremlin.Net (C# nuget) driver from 3.5.2 to 3.6.2, suddenly queries started throwing exceptions, specifically exceptions about serialization errors. To pinpoint the cause, I downgraded the Gremlin.Net driver to 3.5.2 while leaving the engine version updated to 1.2.1.0, the queries started working like before. The problem is, according to the AWS documentation, the minimum required Gremlin.net version is 3.6.2 Will there be a problem keeping the Gremlin.net driver version to 3.5.2? Will there be any side effects?...
No description

vertex-label-with-given-name-does-not-existERROR with Janusgraph 0.5.3

vertex-label-with-given-name-does-not-exist ERROR with Janusgraph 0.5.3 while adding labels to vertices I get this error only when I enable storage.batch-loading=true. My schema.default is still set to default when checking from gremlin console. mgmt.get("schema.default") => default...
Solution:
Any reason why you need automatic schema creation (schema.default=none)? This feature is mainly intended for cases where you just want to try out JanusGraph and don't want to bother with creating a schema. But it's not intended for production use cases. The docs also discourage its usage in general: ...

Documentation states there should be a mid-traversal .E() step?

Just wondering if I'm missing something, or if the docs are mistaken. It's possible to do a mid-traversal .V() step. But it seems like a possible copy paste error is in the Tinkerpop docs asserting a similar power exists for an .E() step? https://tinkerpop.apache.org/docs/current/reference/#e-step ```...
Solution:
Mid-traversal E() has been added to TinkerPop in version 3.7.0. JanusGraph v1.0 is based on TP 3.7.0, so have to support it.

Disabling strategies via string in remote driver

Is there a way to disable a strategy in a providers implementation without a reference to the class? For example, let's say StrategyA is in the providers implementation and I am in Python without access to this. Is there no way to do g.withoutStrategies("com.provider.strategies.StrategyA").V().<etc>()?...
Solution:
in Java, i think you can use TraversalStrategyProxy directly inside of withStrategies() but there is nothing analogous for withoutStrategies(). We probably should have a better way to do both of these things in the Gremlin language which really doesn't have a notion of classes and such.

LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking

πŸ‘‹πŸ» Hi all! In this JanusGraph post (https://discord.com/channels/981533699378135051/1195313165278388334/1195313165278388334), we were investigating if TreeStep could be used jointly with bulked traversers so as to improve traversal time. Based on answers there, it looks like TinkerPop's LazyBarrierStrategy explicitly excludes "path-tracking" traversals (https://github.com/apache/tinkerpop/blob/master/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/strategy/optimization/LazyBarrierStrategy.java#L85) and won't insert NoOpBarrierSteps in those cases, preventing us from bulking traversers....

Is there a way to store the tinkerpop graph in DynamoDB?

AWS provides Neptue graph database but problem with it is that it is not distributed and can't be horizontally scaled like DGraph etc. So I was wondering as DynamodDB is distributed database, if there is a way to store tinkerpop graph in DynamoDB directly?
Solution:
TinkerPop, in general, can be designed to use nearly any back-end. You just need a storage plugin for it. To make it performant, it would also require overriding many of the underlying query execution operators to make sure they are fetching data from DynamoDB table(s) efficiently. TinkerGraph is a reference implementation of this where the storage medium is in-memory hashmaps for both vertices and edges. In practice, most people start with reviewing the code for TinkerGraph as a starting point for creating support for other storage mediums. Once upon a time there was an implementation of TinkerPop called Titan (later became the basis of DSE Graph) that had a storage plugin that worked with DynamoDB. Someone later forked it and added support for such for JanusGraph (another TinkerPop implementation). The plugin is still out there, but hasn't been supported/maintained. https://github.com/amazon-archives/dynamodb-janusgraph-storage-backend JanusGraph, itself, supports a Cassandra backend. We have seen a few folks attempt to use JanusGraph with Amazon Keyspaces (for Apache Cassandra). ...

Connection to Azure cosmos db using Go

Hi All, Asking this as a newbie to Graphs databases in general. I have been trying to connect to an Azure Cosmos Graph database using Apache Tinkerpop SDK for Golang, but am unable to proceed because I can't get past the websocket 1011 error while trying to execute Gremlin queries. Any help would be appreciated....
Solution:
Ah OK - yes. Cosmos DB does not support Gremlin bytecode. It might be worth looking at this documentation: https://learn.microsoft.com/en-us/azure/cosmos-db/gremlin/support They also seem to document that newer Gremlin clients will not work with CosmosDB. I suppose all you can really do is try it. Queries will need to be submitted using the Client.Submit.... type of approach though given "byte code" is not supported....

I met a man with seven wives, each of which had seven sacks.

I met a man with seven wives, each of which had seven sacks. Now, suppose I have shipping container that can hold up to 500 items and I need to inform a number of men that they and their families can board my ship because I know that all the items in their families' sacks will fit it the container. There may be a few empty spaces, but I can't tell a man that he and his family can board if any of their items would overflow the container. How do I construct a query which selects men as long as all the items of in the 7 sacks of their 7 wives will fit. Here's the challenge: I don't know how many items are in each sack until the family is considered. I ask the men and their families to line up and then I board men until the container is nearly full or exactly full and where any of the items of the next family would assuredly not fit. Let's say we have Vertices for Item, Sack, Wife, and Man and Relations marriedTo, hasSack, and hasItem. Let's say we rank order Man by Lastname....

May I suggest a new topic-channel for us? Like "really-big-data" or "pagination"?

Related to https://discord.com/channels/838910279550238720/1100527694342520963/1100853192922759244 and having read the recommended links on how to paginate the end of a query, I am wondering about how to manage large sets of traversals and large side-effect-collected collections which a query might be encountering or constructing as the graph is visited when the paths offer relatively large datasets after having been wisely filtered. For example, what is advised if one really does need to group by first-name all the followers of Taylor Swift (i.e. some exemplary uber-set) and wants to bag that for a later phase of a query which isn't the final collection that will be consumed by some external REST client? Yes, the final collect step can be easily paginated as advised - but what about all that earlier processing? What should we be thinking when we anticipate having 500,000, or 10x this, traversals heading into a group by - by - bye! barrier / collecting stage? Other than, "Punt" or "Run away!"?...
Solution:
There is not a feature in Gremlin directly that will directly handle this for you automatically but the drivers do let you stream back results instead of collecting them all at once which can help mitigate transferring large result sets. If you are using Amazon Neptune it also has a query results cache to assist with paging: https://docs.aws.amazon.com/neptune/latest/userguide/gremlin-results-cache.html#gremlin-results-cache-paginating

Integration tests for AWS Neptune DB

do we have any Testcontainers for AWS Neptune for writing integration tests in java applications

G.V() IDE can't visualize path().by(valueMap()) query

Hi @G.V() - Gremlin IDE (Arthur) sorry if this is a duplicate question. I am playing around with G.V() IDE and run into a problem: if I run a path() query, G.V() IDE could properly display the nodes and the edges, but since a path query itself does not return properties, G.V cannot display the properties. If I add .by(valueMap()) to the end of the query, then the Gremlin query result would include both the path and properties, but G.V IDE cannot visualize it. Is this a known problem? Thx!

Beginner Gremlin Questions

Hello - I am trying to do an Advent of Code challenge as a graph problem to learn some Gremlin, and am running into some walls. I'm working on Part 2 of Day 4. The conceit is that there are scratchcards, and that for the number of matches you have on a given card you win copies of other cards (e.g. a card with ID 3 that has two matches would win copies of the next two cards with ID 4 and 5, recursively generating more copies of cards if those two cards also had matches), and you're trying to count the total number of cards won for a given input of cards. I've attached the graph that I'm working with (the sample input from the challenge, visible at the link above). I've parsed the input into Card vertices with a cardId property, and CardNumber and WinningNumber vertices each with a value property and a fromCard edge pointing to the card they are from....

G.V Graph Playground: Gremlin client

@G.V() - Gremlin IDE (Arthur) Quick question: Does G.V graph playground allow adding vertices & edges programmatically? In other words, can I use a gremlin client to connect to the playground graph?
Solution:
Hey not at the moment but that's on the roadmap, I'm working through a big 2.0 rewrite atm which should be out by end of January and hoping to have G.V() playgrounds served by a gremlin server managed by g.v before end of q1, I don't think it should be too much work

Splitting a query with range()

I have a Gremlin Query that starts simple (one Label), and then branches out to many different paths to collect unrelated informations (aka, I need to follow those paths). I'm considering using range() to break down that query into smaller chunks of, say 1k rows' and avoid processing the whole set of Labels into one. Of course, I'll have to run the query several times, but I expect each run to be faster, better fit in memory. May be I'll escape some fast degradation by keeping the load small enough. Does that sound like a good idea? I'm usually concerned such partitioning means that the common part of the query (before the range()) is executed several times, and that limits the speed potential. In the current case, it is merely a hasLabel() + some property collection. ...
Solution:
I have often used range() steps to break up queries, it can be a useful technique but does come with several caveats. The most important piece is that this will only work if your database guarantees that the common part of your query will always produce results in a consistent order. The default implementation of range(x, x + 1000) will first iterate and discard the first x results, then pass the next 1000. If the result ordering changes on each execution, then you will essentially be taking a random sample of 1000 results each time, instead of progressively going batch by batch. You already mentioned the performance concerns with the common part of the query being executed each time, due to the way this is implemented, this performance penalty is proportional to x (minimal penalty when x is small as almost no results are skipped, larger penalty with large x as many results need to be processed and skipped). Results will depend greatly on your DB and your data but in general, if the left-hand side of the query is fast and efficient in your DB, and the right-hand side is slow and complex, then this technique works quite well....

Exception saving as Gryo

When trying to save as gryo getting the error, Unable to create serializer "org.apache.tinkerpop.shaded.kryo.serializers.FieldSerializers" for class java.util.concurrent.atomic.AtomicLong. What could be the possible reason.
Solution:
With some trail and error found the solution - openjdk is possibly the reason. I traied with standard jdk/jre, now it works.

AWS Neptune: Pong fails and close event not emitted

Hey guys, long time no see. We have an issue which occurred a few times in the last couple weeks and we've been investigating for a while; posting here in case the issue is maybe known. We are using the gremlin-aws-sigv4 in a NodeJS project. We occasionally do a ping to the server and wait for a Pong with a timeout of 3 seconds....
Solution:
Interesting approach. Our typical guidance is to not worry about whether or not the connection is live and assume it is always available. Then build in exception handling and reconnect logic for the condition when a query is sent to a closed connection. Neptune will close connections on the server side if they are idle for more than 20-25 minutes.

Gremlin upsert on a vertex but preventing updates on a particular property on a vertex during upsert

Hi, Following is an upsert query on a vertex with label 'stvertex' and I am required to initialize and default the 'flagProp' to 0 only during creation of this vertex but during update I am required to prevent any update only on 'flagProp' while rest of the property should update, below is one of the approach I took: g.V(id).fold().coalesce(unfold(), __.addV(β€˜stvertex’) .property(T.id, id).property(Cardinality.single, flagProp,0)) .property(Cardinality.single, name, nameValue) .property(Cardinality.single, station, stationValue) .property(Cardinality.single, base, baseValue).next() The above query only works for creation of new vertex I am looking for some help to write the correct query which works for update as well...
Solution:
Hi Salman, Better to use mergeV step. You may set flagProp in onCreate option, and create/update all other properties in both onCreate and onMatch, something like `gremlin> g.mergeV([(T.label):'stvertex'])....