`next(n)` with Gremlin JavaScript
I'm trying to do some basic pagination
next(n)
seems perfect, but it doesn't appear to be available for JavaScript as per the documentation.
Is there a reason for this limitation?12 Replies
Interesting, didn't know that we have that in the docs and I especially why it says that about Gremlin.Net since we have support for
Next(n)
there:
https://github.com/apache/tinkerpop/blob/82fe33939aa7058bd90d0dcc178817cc5720df17/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/DefaultTraversal.cs#L220
For JavaScript, it really seems to be missing, but I can't say whyGitHub
tinkerpop/gremlin-dotnet/src/Gremlin.Net/Process/Traversal/DefaultT...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
gremlin-javascript
does support streaming, though. So you could likely get the same behavior using this: https://tinkerpop.apache.org/docs/current/reference/#_processing_results_as_they_are_returned_from_the_gremlin_serverhuh, I thought
next(n)
gonna do something special
This look like it's just doing Next
multiple time.
Is this only possible with submitting gremlin script? As in you can't do the same via traversal g.etc
yep, the reason for this is that Gremlin.Net will send your traversal completely for evaluation to the server and get all results back. So, when you just call
Next()
, then you'll simply get the first result back, but all other results are available locally any way. That's why you can also simply call Next()
again afterwards to get more results
If you want the server to only iterate your traversal until it has produced X results, then you need to use something like Limit(x)
which will be sent to the serverI also see
range(x,y)
used heavily for this sort of pattern. Just be careful, because many databases will not ensure that range()
calls are idempotent if you're making changes to the database while performing subsequent queries with range()
.Solution
AFAIK, that is only possible via scripts.
But you can do something like this:
Also note that not all graph databases will be able to execute
range(x, y)
efficiently.
If I'm not mistaken, then JanusGraph will for example in this case simply execute your traversal as if you would have used limit(y)
. So, it will still fetch the first x
values from your backend. That of course defies the purpose of using the range()
step.
I'm not sure though so it's best to use profile()
to try it outNot even sure if I need pagination in my case tbh, my concern is pulling in a large amount of data in one go might break or tickle something funny. Like should I not even bother if say there're less than 10k results?
I guess it depends on how computationally complex it is to find each result. If this is just a "find the first 10k vertices with x property", then there may not be a need. But if you're doing something like the code example above where you want to find multiple paths between two objects with varying levels of depth/breadth, then streaming, pagination, or using query cache are likely better patterns.
First time hearing about
query cache
, what's that about?"Query cache" meaning more the traditional way of building an external query cache using something like Redis or Memcached. You're application hashes the overall query and stores the result of the query with the hash. Upon further queries being submitted, your application would first do a hash lookup in the cache to see if the results were previously retrieved. If so, return the cached results. We discuss this sort of pattern in relation to Neptune here: https://aws.amazon.com/blogs/database/part-3-accelerate-graph-query-performance-with-caching-in-amazon-neptune/
You can get really creative with this sort of pattern and derive your own cache invalidation strategies or use a TTL against each stored hash. You can even use this for pagination, which is where we find this used most.
Amazon Web Services
Accelerate graph query performance with caching in Amazon Neptune, ...
Graph databases are uniquely designed to address query patterns focused on relationships within a given dataset. From a relational database perspective, graph traversals can be represented as a series of table joins, or recursive common table expressions (CTEs). Not only are these types of SQL query patterns computationally expensive and complex...