Comma Separated Config Options Values Via Environment Variable?

Hey @Bo & @porunov I was trying out the retry logic added by my PR (https://github.com/JanusGraph/janusgraph/pull/4409) by setting environment variables in my docker compose like so: janusgraph.index.some_index_name_here.elasticsearch.retry-error-codes=408,429 Much like was done in the unit test I wrote (https://github.com/JanusGraph/janusgraph/blob/487e10ca276678862fd8fb369d6d17188703ba67/janusgraph-es/src/test/java/org/janusgraph/diskstorage/es/rest/RestClientSetupTest.java#L240). But to my surprise it doesn't seem to be received well during startup:
janusgraph | Caused by: java.lang.NumberFormatException: For input string: "408,429" janusgraph | at java.base/java.lang.NumberFormatException.forInputString(Unknown Source) ~[?:?] ..... janusgraph | at org.janusgraph.diskstorage.es.rest.RestClientSetup.connect(RestClientSetup.java:83) ~[janusgraph-es-1.1.0-20240427-134121.4bddfb4.jar:?]
The surprise to me being it's using the same String[] parsing logic as other commands, which got me curious if others didn't work either. So I tried a more mundane one like janusgraph.index.some_index_name_here.hostname=elasticsearch,elasticsearch And it too doesn't seem to get parsed correctly:
janusgraph | Caused by: java.net.UnknownHostException: elasticsearch,elasticsearch janusgraph | at java.base/java.net.InetAddress$CachedAddresses.get(Unknown Source) ~[?:?] .... janusgraph | at org.elasticsearch.client.RestClient.performRequest(RestClient.java:288) ~[elasticsearch-rest-client-8.10.4.jar:8.10.4] janusgraph | at org.janusgraph.diskstorage.es.rest.RestElasticSearchClient.clusterHealthRequest(RestElasticSearchClient.java:171) ~[janusgraph-es-1.1.0-20240427-134121.4bddfb4.jar:?]
So just wondered is there a special syntax that's intended here?
Solution:
Okay, after quite a journey of vetting the parsing behavior from environment variable into property files, into the various Confirmation implementations into ConfigOptions, etc etc I've found where this has gone sideways....and like the worst of these types of situations looks like this one was self inflicted 🤦‍♂️ . A long long long time ago when I was starting on working with JanusGraph I was tired of JanusGraph deployments that silently failed and leaving my container up but dead inside the container, for reasons that weren't its fault (started up before Cassandra was ready timing out its storage wait time, etc). Trying to find solutions for that I stumbled upon the Tinkerpop CheckGraphManager (https://github.com/apache/tinkerpop/blob/master/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/util/CheckedGraphManager.java) that would automatically terminate the process if the specified graphs did not successfully open....
GitHub
tinkerpop/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/man...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
Jump to solution
22 Replies
Bo
Bo6mo ago
Is there any chance you could try it out without using docker?
criminosis
criminosisOP6mo ago
But Docker is provided as an official way to deploy JanusGraph, so shouldn't we be able to do this through Docker?
Bo
Bo6mo ago
yeah I am just wondering if there's something wrong with our Docker image I've never seen this bug before - but I've never used Docker to deploy JanusGraph
Solution
criminosis
criminosis6mo ago
Okay, after quite a journey of vetting the parsing behavior from environment variable into property files, into the various Confirmation implementations into ConfigOptions, etc etc I've found where this has gone sideways....and like the worst of these types of situations looks like this one was self inflicted 🤦‍♂️ . A long long long time ago when I was starting on working with JanusGraph I was tired of JanusGraph deployments that silently failed and leaving my container up but dead inside the container, for reasons that weren't its fault (started up before Cassandra was ready timing out its storage wait time, etc). Trying to find solutions for that I stumbled upon the Tinkerpop CheckGraphManager (https://github.com/apache/tinkerpop/blob/master/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin/server/util/CheckedGraphManager.java) that would automatically terminate the process if the specified graphs did not successfully open. So I switched to that graph manager like so: ENV gremlinserver.graphManager=org.apache.tinkerpop.gremlin.server.util.CheckedGraphManager But this has the unintended consequence of removing the configuration parsing handling that is afforded by JanusGraphManager (https://github.com/JanusGraph/janusgraph/blob/487e10ca276678862fd8fb369d6d17188703ba67/janusgraph-core/src/main/java/org/janusgraph/graphdb/management/JanusGraphManager.java#L73), like handling String arrays config options 🤦‍♂️ . Would there be appetite for like a CheckJanusGraphManager that would extend JanusGraphManager and overlay a duplication of logic that the Tinkerpop CheckGraphManager performs? I could attempt to put that up. It's been handled in my k8s environment so the container doesn't just hang around as a faster version of waiting for probes to fail.
GitHub
tinkerpop/gremlin-server/src/main/java/org/apache/tinkerpop/gremlin...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
GitHub
janusgraph/janusgraph-core/src/main/java/org/janusgraph/graphdb/man...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
Florian Hockmann
But such a graph manager would also only work for problems during startup, right? Doesn't our wait for storage logic during the startup of the Docker container also handle this problem? https://github.com/JanusGraph/janusgraph/blob/master/janusgraph-dist/docker/docker-entrypoint.sh#L71
GitHub
janusgraph/janusgraph-dist/docker/docker-entrypoint.sh at master · ...
JanusGraph: an open-source, distributed graph database - JanusGraph/janusgraph
criminosis
criminosisOP6mo ago
I was also wanting the container to die if it failed to successfully apply the schema I had defined for it to load up on start.
Florian Hockmann
I see, that's of course a different requirement Doesn't it take quite some time to load the schema every time a container starts? Most of the time the schema should already be fully loaded, no?
criminosis
criminosisOP6mo ago
Of course, no argument there. But it's what drew me to the checked graph manager, albeit at the cost of unwittingly opting out of the JG add-ons with its graph manager. Hence asking if there would be appetite for a PR that'd add a "Checked"JanusGraphManager
Florian Hockmann
(Just trying to better understand your use case here)
criminosis
criminosisOP6mo ago
The scripts check if the schema was already loaded and skips doing it again. Each script is versioned and writes a "completed version X" indicator into the graph Loosely inspired by liquibase
Florian Hockmann
A problem I'm seeing with that approach is that schema changes require collaboration of all running JanusGraph instances which is hard to guarantee if you're automatically applying them every time a container starts. At my company we are only applying schema changes manually and stop all other JanusGraph instances and also all clients so only a single instance is up and running where we then execute the schema script. But in general, schema management is really a hard topic with JanusGraph which leads to a lot of headaches and it's definitely an area where JanusGraph could be improved a lot
criminosis
criminosisOP6mo ago
Well, to be clear I'm not doing it in the containers that are running JG for purposes of hosting, it's done in a secondary loader container that runs just once at the deployment start
Florian Hockmann
Ah, I see, then it even sounds similar to what we're doing
criminosis
criminosisOP6mo ago
But now that I'm retracing through the steps, I automated this a long time ago so it's gotten a little dusty in my memory, I may have conflated an issue here The reason I did it wasn't for schema
Florian Hockmann
I know that problem 😅
criminosis
criminosisOP6mo ago
Yeah, I ended up doing something similar with our hold HBase system. It feels fairly common place to rebuild these tools 😅 The storage timescript does watch for cassandra to be up, but IIRC it would just timeout but not kill the container? So the graph would attempt to open, fail, and then JG would just hang out without any graphs being opened.
Florian Hockmann
I guess if such a graph manager is useful for you, then it will probably also be useful for others. As a first step you could create an issue for this where you describe what that graph manager does differently and what benefits it brings: https://github.com/JanusGraph/janusgraph/issues/new?assignees=&labels=&projects=&template=feature-request.md You could then wait a bit to see whether it gets any reactions and then provide a PR
GitHub
Build software better, together
GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.
From An unknown user
From An unknown user
From An unknown user
criminosis
criminosisOP6mo ago
Vs what I wanted was the container to die in that case. Which the checked graph manager would do.
Florian Hockmann
With storage timescript you mean the docker-entrypoint.sh logic I mentioned earlier? If I'm not completely mistaken, then that should let the startup fail before even starting JanusGraph Server which means that the container won't get up and running at all It first tries to open the graph locally in the Gremlin Console via JanusGraphFactory.open() with your config and only if that works, it will start JanusGraph Server with the same config
criminosis
criminosisOP6mo ago
I'll have to reproduce the scenario before I create the feature request. TBH this was a change I made over a year ago so the details have fallen into the memory aether if I'm being honest 😅 We also sharpened other k8s infra with liveliness probes since then so we may have addressed this issue, at least for ourselves, via other means.
criminosis
criminosisOP6mo ago
Looking at my commit message for the change it looks like I described it as so:
Kill JG if the graph fails to open instead of idly be hosting no graphs
So it seems I'm correctly remembering the symptom, but having defined steps for reproduction would be beneficial for the feature request
Want results from more Discord servers?
Add your server