criminosis
criminosis
Explore posts from servers
JJanusGraph
Created by NeO on 10/23/2024 in #questions
custom vertex id (String) feature to avoid duplicate vertex
Fwiw I asked a similar question a few months ago. It sounded like at least the vertex shouldn't be duplicated based on the answer I got (https://discord.com/channels/981533699378135051/1204932121408442478/1205063257430302730), but not sure if that attends to vertex properties.
8 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
Makes sense. Opened an issue: https://issues.apache.org/jira/browse/TINKERPOP-3113
since the channel holding references to results definitely shouldn't be happening.
Fwiw to be clear it isn't holding a reference to the result, or at least result as the output if I understand correctly. It's holding a reference to the session task that should be "done" and presumably GC'ed. But the CloseListener on the channel is given a reference to it and the listener is never removed from what I can tell.
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
I guess are next steps to write up an issue report on the Jira after making an account @Kennh? I wouldn't be opposed to taking a stab at trying to fix it either assuming my theory is correct that we're missing a listener removal for the happy path to remove the cancelation future.
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
FTR you got it @Kennh . I totally forgot to replicate that locally. I changed the channelizer a long time ago in my deployment via environment variable 🤦‍♂️. But fwiw I'm able to reproduce the OOM now locally with both my Rust and toy Java app.
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
Gotcha, I was a couple weeks into performance tuning in my Rust app when this started happening. I'll try it with what I was using previously (WsAndHttpChannelizer) and see if it's different. 👍 I saw it was teeing up as the default channelizer, thought maybe it was ready to try out, but sounds like maybe I jumped the gun 😅
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
If any Tinkerpop people see this thread, I'd be curious if there's any hypothesises with regards to the 43,536 single tasks sessions in the heap dump. I haven't changed maxWorkQueueSize from its default of 8192 which seems like it should have applied backpressure here if I was somehow spawning too many concurrent sessions. 🤔 But in the meantime continuing to try to reproduce this locally outside my staging environment. So far that's the only place the OOMs seem to be happening, but other than slower writes to backing datastores there shouldn't be significant differences. It runs within the same container image as I use in my staging environment.
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
Which seems like if the channel doesn't close, these should accumulate. The Rust library does maintain a pool of websocket connections so that does seem to line up with the behavior I'd expect. It purely does just serialize the request, send it up the channel, receive the response, and then puts the websocket connection back in the pool. But when I try to reproduce this behavior using a toy Java application & the Java driver the SingleTaskSessions don't seem to accumulate in the same way.
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
Looking at, what I believe is the relevant code it does seem like that does line up https://github.com/apache/tinkerpop/blob/b7c9ddda16a3d059b2a677f578f131a7124187b6/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/handler/AbstractSession.java#L166-L170 In that until the channel closes, there doesn't (seem?) to be something that removes the listener for the session from the channel's close future listener list. And that listener holds a reference to the session and transitively, the Bytecode that seems to be building up and OOMing me.
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
No description
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
And then a follow-up mutation to the traversal adds the edge creation portion
traversal.side_effect(
__.select("payload")
.select("in_edges")
.unfold()
.as_("edge_map")
.merge_e(__.select("edge_map"))
.option((Merge::InV, __.select("v")))
.property("last_modified_timestamp", Utc::now()),
)
traversal.side_effect(
__.select("payload")
.select("in_edges")
.unfold()
.as_("edge_map")
.merge_e(__.select("edge_map"))
.option((Merge::InV, __.select("v")))
.property("last_modified_timestamp", Utc::now()),
)
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
fn build_set_prop_side_effect() -> TraversalBuilder {
__.select("payload")
.select("set_props")
//This unfolds into the object stream "key":["value",...]
.unfold()
.as_("kvals")
//This separates out the individual values into the object stream (without the key)
.select("kvals")
.by(Column::Values)
.unfold()
.as_("value")
//now select the vertex again back into the object stream and apply these as separate entries
.select("v")
.property_with_cardinality(
Cardinality::Set,
//This pulls the key in from further up the object stream
__.select("kvals").by(Column::Keys),
//This pulls in the value that was before the vertex in the object stream
__.select("value"),
)
}
fn build_set_prop_side_effect() -> TraversalBuilder {
__.select("payload")
.select("set_props")
//This unfolds into the object stream "key":["value",...]
.unfold()
.as_("kvals")
//This separates out the individual values into the object stream (without the key)
.select("kvals")
.by(Column::Values)
.unfold()
.as_("value")
//now select the vertex again back into the object stream and apply these as separate entries
.select("v")
.property_with_cardinality(
Cardinality::Set,
//This pulls the key in from further up the object stream
__.select("kvals").by(Column::Keys),
//This pulls in the value that was before the vertex in the object stream
__.select("value"),
)
}
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
The two utility methods in the side effects are below
fn build_single_prop_side_effect() -> TraversalBuilder {
__.select("payload")
.select("single_props")
//This unfolds each of the single cardinality properties into their key value pairs into the object stream
.unfold()
.as_("kv")
.select("v")
.property(
//These pull apart the key value pair in the object stream as parameters to property()
__.select("kv").by(Column::Keys),
__.select("kv").by(Column::Values),
)
}
fn build_single_prop_side_effect() -> TraversalBuilder {
__.select("payload")
.select("single_props")
//This unfolds each of the single cardinality properties into their key value pairs into the object stream
.unfold()
.as_("kv")
.select("v")
.property(
//These pull apart the key value pair in the object stream as parameters to property()
__.select("kv").by(Column::Keys),
__.select("kv").by(Column::Values),
)
}
17 replies
ATApache TinkerPop
Created by criminosis on 10/1/2024 in #questions
Tinkerpop Server OOM
If it's helpful traversals are mostly upserts. I build a list, where each element is a map of maps, and each map corresponds to being selected later on. The list is then injected into the traversal and then unfolded. That then unfolds the map of maps into the traversal, and then one map is selected for a mergeV step, and then the returned vertex is given an alias. I then drop all the vertex's properties (so I can have on the vertex exactly what was in the injected map). Then the other maps are then selected out for setting properties as side effects depending on their needed cardinalities, and then lastly a map is selected out that indicates edges to draw to other vertices, and properties to set on that edge.
client
.get_traversal()
.inject(upsert_batch)
//Unfold each vertex's payload into the object stream (individually)
.unfold()
.as_("payload")
//Upsert the vertex based on the given lookup map
.merge_v(__.select("lookup"))
.as_("v")
//Drop all properties that were there, we'll want to assign everything that is given
.side_effect(__.properties(()).drop())
//First up are the simple single cardinality properties, just assign each on that is given
.side_effect(build_single_prop_side_effect())
//Next are the set properties. Similar to the single cardinality properties but we have to unfold the nested collection of values
.side_effect(build_set_prop_side_effect())
client
.get_traversal()
.inject(upsert_batch)
//Unfold each vertex's payload into the object stream (individually)
.unfold()
.as_("payload")
//Upsert the vertex based on the given lookup map
.merge_v(__.select("lookup"))
.as_("v")
//Drop all properties that were there, we'll want to assign everything that is given
.side_effect(__.properties(()).drop())
//First up are the simple single cardinality properties, just assign each on that is given
.side_effect(build_single_prop_side_effect())
//Next are the set properties. Similar to the single cardinality properties but we have to unfold the nested collection of values
.side_effect(build_set_prop_side_effect())
17 replies
ATApache TinkerPop
Created by criminosis on 7/27/2024 in #questions
op_traversal P98 Spikes
For anyone else that finds this thread the things I ended up finding to be issues: - Cassandra's disk throughput I/O (EBS gp3 is 125MB/s by default, at least for my use case a I was periodically maxxing that out, increasing to 250MB/s resolved that apparent bottleneck). So if long sustained writing occured the 125MB/s was not sufficent. - Optimizing traversals to using mergeE/mergeV that were either older groovy-script based evaluations I was submitting or older fold().coalesce(unfold(),...) style "get or create" based vertex mutations. The former was identified using EBS volume stats and confirming at a lower level using iostats the latter was via gathering and monitoring metrics emitted by Gremlin Server (in JanusGraph).
9 replies
ATApache TinkerPop
Created by criminosis on 7/27/2024 in #questions
op_traversal P98 Spikes
The criteria that's in lookup that gets applied to mergeV() should return a single upserted vertex since I directly dictate the vertex id as a parameter so all the side_effects steps after that should apply to a single vertex after the injected list is unfolded and the iter() at the end should dictate no response I/O. single_props will have usually 7-10 properties in it. Two of which may be very large documents (10s MBs) for leveraging full-text searching with my JanusGraph deployment being backed by Elasticsearch. It's worth mentioning though the latency screenshot has the Elasticsearch indices disabled as I was trying to narrow the space of what may have been slowing things down and I've confirmed through system telemetry no I/O is going to Elasticsearch. set_props will have usually 6-8 properties in it, with each one having having only 2-3 properties contained for its corresponding key. The edges varies greatly. At least 1 entry but may have dozens.
9 replies
ATApache TinkerPop
Created by criminosis on 7/27/2024 in #questions
op_traversal P98 Spikes
graph.get_traversal()
.inject(list_of_each_vertex_payload)
//Unfold each vertex's payload into the object stream (individually)
.unfold()
.as_("payload")
//Upsert the vertex based on the given lookup map
.merge_v(__.select("lookup"))
.as_("v")
//Drop all properties that were previously there
//Assign only what is given upon the vertex
.side_effect(__.properties(()).drop())
//First up are the simple single cardinality properties
.side_effect(
__.select("payload")
.select("single_props")
//This unfolds each of the single cardinality properties into their key value pairs into the object stream
.unfold()
.as_("kv")
.select("v")
.property(
//These pull apart the key value pair in the object stream as parameters to property()
__.select("kv").by(Column::Keys),
__.select("kv").by(Column::Values),
),
)
//Next are the set properties. Similar to the single cardinality properties but we have to unfold the nested collection
.side_effect(
__.select("payload")
.select("set_props")
//This unfolds into the object stream "key":["value",...]
.unfold().as_("kvals")
//This separates out the individual values into the object stream (without the key)
.select("kvals").by(Column::Values).unfold().as_("value")
//now select the vertex again back into the object stream and apply these as separate entries
.select("v").property_with_cardinality(
Cardinality::Set,
//This pulls the key in from further up the object stream
__.select("kvals").by(Column::Keys),
//This pulls in the value that was before the vertex in the object stream
__.select("value")
)
)
//Now create edges. The "edges" map within the vertex's payload
//is a vec of maps that enumerate the "from" vertex ids and other upsert properties for mergeE
.side_effect(
__.select("payload").select("edges").unfold().as_("edge_map")
.merge_e(__.select("edge_map"))
.option((Merge::InV, __.select("v")))
.property("last_modified_timestamp", Utc::now())
).iter().await
graph.get_traversal()
.inject(list_of_each_vertex_payload)
//Unfold each vertex's payload into the object stream (individually)
.unfold()
.as_("payload")
//Upsert the vertex based on the given lookup map
.merge_v(__.select("lookup"))
.as_("v")
//Drop all properties that were previously there
//Assign only what is given upon the vertex
.side_effect(__.properties(()).drop())
//First up are the simple single cardinality properties
.side_effect(
__.select("payload")
.select("single_props")
//This unfolds each of the single cardinality properties into their key value pairs into the object stream
.unfold()
.as_("kv")
.select("v")
.property(
//These pull apart the key value pair in the object stream as parameters to property()
__.select("kv").by(Column::Keys),
__.select("kv").by(Column::Values),
),
)
//Next are the set properties. Similar to the single cardinality properties but we have to unfold the nested collection
.side_effect(
__.select("payload")
.select("set_props")
//This unfolds into the object stream "key":["value",...]
.unfold().as_("kvals")
//This separates out the individual values into the object stream (without the key)
.select("kvals").by(Column::Values).unfold().as_("value")
//now select the vertex again back into the object stream and apply these as separate entries
.select("v").property_with_cardinality(
Cardinality::Set,
//This pulls the key in from further up the object stream
__.select("kvals").by(Column::Keys),
//This pulls in the value that was before the vertex in the object stream
__.select("value")
)
)
//Now create edges. The "edges" map within the vertex's payload
//is a vec of maps that enumerate the "from" vertex ids and other upsert properties for mergeE
.side_effect(
__.select("payload").select("edges").unfold().as_("edge_map")
.merge_e(__.select("edge_map"))
.option((Merge::InV, __.select("v")))
.property("last_modified_timestamp", Utc::now())
).iter().await
9 replies
ATApache TinkerPop
Created by criminosis on 7/27/2024 in #questions
op_traversal P98 Spikes
I'm inferring they're building up though because the timer starts before the traversal is submitted into the executor service here: https://github.com/apache/tinkerpop/blob/418d04993913abe60579cbd66b9f4b73a937062c/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/traversal/TraversalOpProcessor.java#L212C44-L212C60, and then is actually submitted for execution on line 304. If they're not building up then I'm very perplexed what a few traversals could be doing to distort the P98 by so much if it isn't sitting in the queue for most of it. The traversal in question is a batching up of upserts to the graph. Each vertex's worth of info is loaded into various hashmaps that then get combined into a collective map and then each vertex's info is then combined into a list that's injected at the start of the traversal. It looks like this:
9 replies
ATApache TinkerPop
Created by criminosis on 7/27/2024 in #questions
op_traversal P98 Spikes
The spikes tend to correlate to long sustained periods of writes. Normally my application will write a series of a few dozens of vertices through the traversal. But sometimes it ends up being 10s of thousands. Obviously I don't try to do those all at once. The stream of things to write are chunked to batches of 50 vertices at at time and are submitted as concurrent traversals. My connection pool to JanusGraph is currently capped to 5, so no more than 5 should be getting submitted concurrently which is why I'm somewhat confused why the 10s of thousands seem to cause this outsized impact if from the GremlinServer's perspecive it isn't getting more than 5 concurrently unless the nature of sustained traffic is an issue. However I haven't found a metric to confirm this from the JanusGraph/GremlinServer side. On the JanusGraph side I've made dashboards visualizing the CQLStorageManager which never seems to have a build up on writes to Cassandra from what I can tell which leads me to be suspect the TraversalOpProcessor is the bottleneck, but from what I can tell of the JMX emitted metrics there doesn't appear to be an emitted metric of the work queue for the underlying created ExecutorService here: https://github.com/apache/tinkerpop/blob/418d04993913abe60579cbd66b9f4b73a937062c/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/util/ServerGremlinExecutor.java#L100-L105. So as a proxy I've been using the JMX metric metrics_org_apache_tinkerpop_gremlin_server_GremlinServer_op_traversal_Count but that only tells me the number of traversals from the perspective of the metric (https://github.com/apache/tinkerpop/blob/418d04993913abe60579cbd66b9f4b73a937062c/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/op/traversal/TraversalOpProcessor.java#L80) as they come through and not their possible build up in the task queue for the executor service.
9 replies
JJanusGraph
Created by criminosis on 7/10/2024 in #questions
MergeV "get or create" performance asymmetry
Reran my trial with both set to chunk sizes of 10 of the 10k batch (both still had 10 parallel connections allowed). So this reduced the MergeV chunk (what it'd inject into the traversal) down from 200 to 10, but figured that'd make it more comparable on the lookup side. MergeV got way worse 🤔
Trial 0 w/ 10000 vertices
Reference: 7834 ms MergeV: 7464 ms
MergeV redo: 11842 ms
Reference: 2377 ms MergeV: 11526 ms (All read, dataset swap)

Trial 1 w/ 10000 vertices
Reference: 7747 ms MergeV: 8399 ms
test mergev_demo has been running for over 60 seconds
MergeV redo: 11411 ms
Reference: 2247 ms MergeV: 12502 ms (All read, dataset swap)

Trial 2 w/ 10000 vertices
Reference: 7477 ms MergeV: 7834 ms
MergeV redo: 11205 ms
Reference: 2211 ms MergeV: 13655 ms (All read, dataset swap)

Trial 3 w/ 10000 vertices
Reference: 8258 ms MergeV: 8440 ms
MergeV redo: 11932 ms
Reference: 2154 ms MergeV: 12316 ms (All read, dataset swap)

Trial 4 w/ 10000 vertices
Reference: 9069 ms MergeV: 8718 ms
MergeV redo: 18618 ms
Reference: 3375 ms MergeV: 25823 ms (All read, dataset swap)

Trial 5 w/ 10000 vertices
Reference: 14659 ms MergeV: 13939 ms
MergeV redo: 21259 ms
Reference: 4586 ms MergeV: 23869 ms (All read, dataset swap)

Trial 6 w/ 10000 vertices
Reference: 17766 ms MergeV: 19030 ms
MergeV redo: 22648 ms
Reference: 3790 ms MergeV: 20051 ms (All read, dataset swap)

Trial 7 w/ 10000 vertices
Reference: 13271 ms MergeV: 14976 ms
MergeV redo: 21850 ms
Reference: 3126 ms MergeV: 23877 ms (All read, dataset swap)

Trial 8 w/ 10000 vertices
Reference: 14844 ms MergeV: 16161 ms
MergeV redo: 26621 ms
Reference: 3400 ms MergeV: 23748 ms (All read, dataset swap)

Trial 9 w/ 10000 vertices
Reference: 18169 ms MergeV: 17160 ms
MergeV redo: 29828 ms
Reference: 4315 ms MergeV: 26917 ms (All read, dataset swap)
Trial 0 w/ 10000 vertices
Reference: 7834 ms MergeV: 7464 ms
MergeV redo: 11842 ms
Reference: 2377 ms MergeV: 11526 ms (All read, dataset swap)

Trial 1 w/ 10000 vertices
Reference: 7747 ms MergeV: 8399 ms
test mergev_demo has been running for over 60 seconds
MergeV redo: 11411 ms
Reference: 2247 ms MergeV: 12502 ms (All read, dataset swap)

Trial 2 w/ 10000 vertices
Reference: 7477 ms MergeV: 7834 ms
MergeV redo: 11205 ms
Reference: 2211 ms MergeV: 13655 ms (All read, dataset swap)

Trial 3 w/ 10000 vertices
Reference: 8258 ms MergeV: 8440 ms
MergeV redo: 11932 ms
Reference: 2154 ms MergeV: 12316 ms (All read, dataset swap)

Trial 4 w/ 10000 vertices
Reference: 9069 ms MergeV: 8718 ms
MergeV redo: 18618 ms
Reference: 3375 ms MergeV: 25823 ms (All read, dataset swap)

Trial 5 w/ 10000 vertices
Reference: 14659 ms MergeV: 13939 ms
MergeV redo: 21259 ms
Reference: 4586 ms MergeV: 23869 ms (All read, dataset swap)

Trial 6 w/ 10000 vertices
Reference: 17766 ms MergeV: 19030 ms
MergeV redo: 22648 ms
Reference: 3790 ms MergeV: 20051 ms (All read, dataset swap)

Trial 7 w/ 10000 vertices
Reference: 13271 ms MergeV: 14976 ms
MergeV redo: 21850 ms
Reference: 3126 ms MergeV: 23877 ms (All read, dataset swap)

Trial 8 w/ 10000 vertices
Reference: 14844 ms MergeV: 16161 ms
MergeV redo: 26621 ms
Reference: 3400 ms MergeV: 23748 ms (All read, dataset swap)

Trial 9 w/ 10000 vertices
Reference: 18169 ms MergeV: 17160 ms
MergeV redo: 29828 ms
Reference: 4315 ms MergeV: 26917 ms (All read, dataset swap)
6 replies
JJanusGraph
Created by criminosis on 7/10/2024 in #questions
MergeV "get or create" performance asymmetry
I guess technically mergeV is having to lookup 200 vertices per network call whereas Reference's chunk size is only 10, but figured I'd post the question in case this seemed weird to any JG core devs
6 replies