criminosis
Explore posts from serversJJanusGraph
•Created by criminosis on 7/10/2024 in #questions
MergeV "get or create" performance asymmetry
Reran my trial with both set to chunk sizes of 10 of the 10k batch (both still had 10 parallel connections allowed). So this reduced the MergeV chunk (what it'd inject into the traversal) down from 200 to 10, but figured that'd make it more comparable on the lookup side. MergeV got way worse 🤔
6 replies
JJanusGraph
•Created by criminosis on 7/10/2024 in #questions
MergeV "get or create" performance asymmetry
I guess technically mergeV is having to lookup 200 vertices per network call whereas Reference's chunk size is only 10, but figured I'd post the question in case this seemed weird to any JG core devs
6 replies
JJanusGraph
•Created by criminosis on 7/10/2024 in #questions
MergeV "get or create" performance asymmetry
But then figured I should try the "get" side of the "get or create" and I was rather surprised that mergeV seemed to be significantly slower than the "traditional" way of doing it:
"MergeV redo" is the writing the same vertices again from the inital MergeV trial.
The "(All read, dataset swap)" line is running the Reference and MergeV logic again, but with the other's dataset.
6 replies
JJanusGraph
•Created by criminosis on 7/10/2024 in #questions
MergeV "get or create" performance asymmetry
Doing my trials (Cassandra & ES running locally via docker compose, also running JG locally in said docker compose enviornment) I was seeing 2-4x improvement of writes to the graph when the vertices were all novel ids (Reference & MergeV trials would generate distinct datasets to write for each trial):
6 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Looking at my commit message for the change it looks like I described it as so:
Kill JG if the graph fails to open instead of idly be hosting no graphsSo it seems I'm correctly remembering the symptom, but having defined steps for reproduction would be beneficial for the feature request
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
I'll have to reproduce the scenario before I create the feature request.
TBH this was a change I made over a year ago so the details have fallen into the memory aether if I'm being honest 😅
We also sharpened other k8s infra with liveliness probes since then so we may have addressed this issue, at least for ourselves, via other means.
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Vs what I wanted was the container to die in that case. Which the checked graph manager would do.
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
The storage timescript does watch for cassandra to be up, but IIRC it would just timeout but not kill the container? So the graph would attempt to open, fail, and then JG would just hang out without any graphs being opened.
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Yeah, I ended up doing something similar with our hold HBase system. It feels fairly common place to rebuild these tools 😅
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
The reason I did it wasn't for schema
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
But now that I'm retracing through the steps, I automated this a long time ago so it's gotten a little dusty in my memory, I may have conflated an issue here
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Well, to be clear I'm not doing it in the containers that are running JG for purposes of hosting, it's done in a secondary loader container that runs just once at the deployment start
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Loosely inspired by liquibase
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Each script is versioned and writes a "completed version X" indicator into the graph
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
The scripts check if the schema was already loaded and skips doing it again.
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Of course, no argument there. But it's what drew me to the checked graph manager, albeit at the cost of unwittingly opting out of the JG add-ons with its graph manager.
Hence asking if there would be appetite for a PR that'd add a "Checked"JanusGraphManager
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
I was also wanting the container to die if it failed to successfully apply the schema I had defined for it to load up on start.
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
Okay, after quite a journey of vetting the parsing behavior from environment variable into property files, into the various Confirmation implementations into ConfigOptions, etc etc I've found where this has gone sideways....and like the worst of these types of situations looks like this one was self inflicted 🤦♂️ .
A long long long time ago when I was starting on working with JanusGraph I was tired of JanusGraph deployments that silently failed and leaving my container up but dead inside the container, for reasons that weren't its fault (started up before Cassandra was ready timing out its storage wait time, etc).
Trying to find solutions for that I stumbled upon the Tinkerpop CheckGraphManager (https://github.com/apache/tinkerpop/blob/master/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/util/CheckedGraphManager.java) that would automatically terminate the process if the specified graphs did not successfully open.
So I switched to that graph manager like so:
ENV gremlinserver.graphManager=org.apache.tinkerpop.gremlin.server.util.CheckedGraphManager
But this has the unintended consequence of removing the configuration parsing handling that is afforded by JanusGraphManager
(https://github.com/JanusGraph/janusgraph/blob/487e10ca276678862fd8fb369d6d17188703ba67/janusgraph-core/src/main/java/org/janusgraph/graphdb/management/JanusGraphManager.java#L73), like handling String arrays config options 🤦♂️ .
Would there be appetite for like a CheckJanusGraphManager
that would extend JanusGraphManager
and overlay a duplication of logic that the Tinkerpop CheckGraphManager performs? I could attempt to put that up. It's been handled in my k8s environment so the container doesn't just hang around as a faster version of waiting for probes to fail.33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
But Docker is provided as an official way to deploy JanusGraph, so shouldn't we be able to do this through Docker?
33 replies
JJanusGraph
•Created by criminosis on 5/24/2024 in #questions
Comma Separated Config Options Values Via Environment Variable?
I found the env var parsing in the docker startup script (https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-dist/docker/docker-entrypoint.sh#L41-L49)
but it seems like it should be faithfully passing things through: https://github.com/JanusGraph/janusgraph/blob/487e10ca276678862fd8fb369d6d17188703ba67/janusgraph-dist/docker/docker-entrypoint.sh#L48
33 replies