criminosis
criminosis
Explore posts from servers
ATApache TinkerPop
Created by criminosis on 7/27/2024 in #questions
op_traversal P98 Spikes
No description
9 replies
ATApache TinkerPop
Created by criminosis on 2/5/2024 in #questions
Iterating over responses
I've got a query akin to this in a python application using gremlin-python:
t = traversal().with_remote(DriverRemoteConnection(url, "g")) \
.V().has("some_vertex", "some_property", "foo").values()
print(t.next())
t = traversal().with_remote(DriverRemoteConnection(url, "g")) \
.V().has("some_vertex", "some_property", "foo").values()
print(t.next())
some_property is indexed by a mixed index served by ElasticSearch behind JanusGraph, with at least for the moment about 1 million entries. I'm still building up my dataset so foo will actually return about 100k of the million, but future additions will change that. If I do the query as written above it times out, presumably it's trying to send back all 100k at once? If I do a limit of like 100, it seems like I get all 100 at once (of course changing t.next() to instead being a for loop to observe all 100). In the Tinkerpop docs there's mention of a server side setting resultIterationBatchSize with a default of 64. I'd expect it to just send back the first part of the result set as a batch of 64, and I only print 1 of them, discarding the rest. The Gremlin-Python section explicitly calls out a client side batch setting:
The following options are allowed on a per-request basis in this fashion: batchSize, requestId, userAgent and evaluationTimeout (formerly scriptEvaluationTimeout which is also supported but now deprecated).
The following options are allowed on a per-request basis in this fashion: batchSize, requestId, userAgent and evaluationTimeout (formerly scriptEvaluationTimeout which is also supported but now deprecated).
But I'd expect that to just be something if you're wanting to override the server side's default of 64? Ultimately what I'm wanting to do to is have some large result set requested, but only incrementally hold it in batches on the client side without having to hold the entire result set in memory on the client side.
7 replies
ATApache TinkerPop
Created by criminosis on 1/26/2024 in #questions
Documentation states there should be a mid-traversal .E() step?
Just wondering if I'm missing something, or if the docs are mistaken. It's possible to do a mid-traversal .V() step. But it seems like a possible copy paste error is in the Tinkerpop docs asserting a similar power exists for an .E() step? https://tinkerpop.apache.org/docs/current/reference/#e-step
The E()-step is meant to read edges from the graph and is usually used to start a GraphTraversal, but can also be used mid-traversal.
The E()-step is meant to read edges from the graph and is usually used to start a GraphTraversal, but can also be used mid-traversal.
Trying to execute a mid-traversal E() step, at least against JanusGraph from gDotV appears to not register as a valid step, on the off chance that's meaningful context. Looking around I found an old post from 2020 from I believe @spmallette seems to confirm a mid-traversal E() step is not intended to exist despite the docs implying the contrary which got me thinking the docs may have been unintentionally copy pasted from the V() documentation. https://groups.google.com/g/gremlin-users/c/xVzQRLcgQk4/m/L_5uSjSCAgAJ I was hoping to leverage one in order to have a mid-traversal change to a different edge amongst a batch and leverage a composite index over edges that, at least for the first edge, starts off with g.E().has('edge_label', 'edge_property', 'edge_value').where(outV().has(... and then later go a pivot to a later .E() step like you can do with a .V(). Doing g.V().has('out_v_label', 'out_v_property', 'out_v_property_value').outE('edge_label).has('edge_property', 'edge_value') in comparison is significantly slower. Local testing seems to be 6ms for the g.E()... traversal compared to 121ms for the g.V()... pathway. The vertex in question has about 21k edges coming off it, so it's not surprising going from the vertex (which out_v_property is also indexed) and checking for a match among the properties of 21k edges is significantly slower than instead of just "starting" on the matching edge based on a property that's indexed directly.
5 replies
ATApache TinkerPop
Created by criminosis on 8/16/2023 in #questions
VertexProgram filter graph before termination
I have a VertexProgram that operates on vertices of type A and B. B vertices are "below" A vertices. The VertexProgram aggregates stuff about the underlying B vertices into their common parent A vertex. I've successfully done the aggregation, but now I want to filter the A vertices that don't pass a predicate about the aggregated property they've accumulated based on messages from their underlying B vertices. I was thinking of having a follow-up state machine for the VertexProgram like the ShortestPath vertex program does when it does its aggregations to filter any vertices that aren't of label A or are label A but fail the predicate check upon their aggregated value. I tried doing this via .drop() after returning true from the Feature's requiresVertexRemoval() . However it seems SparkGraphComputer doesn't support this feature. I've been able to flatten the identifiers of A vertices using a follow-up map reduce job chained with the VertexProgram but was just wondering if maybe there's something else I'm missing? Being able to return a filtered view of the graph following a VertexProgram's execution would be nice without having to flatten IDs via a trailing MapReduce job writing a Set to the Memory object.
5 replies