porunov
JJanusGraph
•Created by NeO on 10/23/2024 in #questions
custom vertex id (String) feature to avoid duplicate vertex
The issue I explained above is related to anything where the generated column key is unique per element. Meaning: Any edge and any property with cardinality LIST or SET. Properties with cardinality SINGLE will be replaced as they have key equal to PropertyKey id.
8 replies
JJanusGraph
•Created by NeO on 10/23/2024 in #questions
custom vertex id (String) feature to avoid duplicate vertex
As for vertex itself, yes, by using the same custom vertex id you will always map to the same partition. Thus, it's impossible to create 2 duplicate vertices in such case. However, it doesn't mean it won't be possible to create duplicate vertex properties. In fact, there is a chance you will create some duplicate vertex properties in your partition. Meaning, if you have 2 processes which are writing a vertex by id "myId123" with a property LIST like "myProperty=foo,bar" then there is a chance you will have "myProperty=foo,bar,foo,bar" as you pushed 4 properties with different ids to your partition. In other words, it's not safe to rely on the internal vertex id to overcome potential concurrency issues. If you need to ensure uniquencess, I would suggest relying on locking mechanisms (either pessimistic or optimistic locking could work in your case). I would use pessimistic locking if it's critical not to have a duplicate at any moment in time. Otherwise, if it's OK to have eventual correct state then I would use optimistic locking.
8 replies
JJanusGraph
•Created by karthikraju on 9/24/2024 in #questions
Need advice on setting up janusgraph as a microservice
Graphs are isolated. Those two graphs have their own keyspace and don't share any user-space data between each other. You can also define different schema for different graphs.
4 replies
JJanusGraph
•Created by karthikraju on 9/24/2024 in #questions
Need advice on setting up janusgraph as a microservice
Hey! You might want to try dynamic graphs. Generally speaking, you can just bind multiple graphs to Gremlin Server and use them as needed. I.e.
graph1.traversal().V()...
graph2.traversal().V()...
https://docs.janusgraph.org/operations/dynamic-graphs/4 replies
JJanusGraph
•Created by oc007us on 9/17/2024 in #questions
What is the latest version of Cassandra is supported by 1.1.0-SNAPSHOT?
Hey! Cassandra 4 was used for testing. It seems JanusGraph doesn't support Cassandra 5 for now, but I think it should be possible to add Cassandra 5 support. Haven't look at breaking changes yet
2 replies
ATApache TinkerPop
•Created by Neptunion on 9/11/2024 in #questions
JanusGraph AdjacentVertex Optimization
TinkerPop applies all optimization strategies to all queries (including JanusGraph internal optimizations). However, JanusGraph skips some of the optimizations as it sees necessary. We don't currently store information if the optimization strategy modified any part of the query or was simply skipped (potential feature request). Thus, the way I would test if the optimization strategy actually makes any changes or not is to debug the query with the breaking point placed in the necessary optimization strategy.
I.e. in your case I would place a breaking point here: https://github.com/JanusGraph/janusgraph/blob/c9576890b5e9dc48676ccc16a58552b8a665e5f0/janusgraph-core/src/main/java/org/janusgraph/graphdb/tinkerpop/optimize/strategy/AdjacentVertexOptimizerStrategy.java#L58C13-L58C28
If this part is triggered during your query execution - optimization works in this case.
4 replies
JJanusGraph
•Created by karthikraju on 8/8/2024 in #questions
What would be the ideal way to set up deep learning on janusgraph data?
I haven’t personally use DGL, so can’t direct you too much there.
Does DGL support distributed training? If so, I would look into OLAP via Spark to train embeddings in parallel on each node and then reduce those embeddings on the final stage into a single combined model.
However, I didn’t look into DGL and don’t know if it can even merge weights afterwards (so, it’s something to check out).
If your Graph is small then you might not need to use parallel training and probably you could use normal OLTP via Gremlin.
For example, you can produce a list of maps with outV, edge label, and inV as you asked above by using something like this:
g.E().project(“from_vertex”, “relationship”, “to_vertex”).by(outV().id()).by(label()).by(inV().id()).toList()
5 replies
JJanusGraph
•Created by Aiman on 6/24/2024 in #questions
How do we generate transaction logs ?
The transaction log is processed on the server side and not on the client side.
Thus, you would need to create a log processor on your server side (as showed in the documentation). The log processor will be triggered anytime a mutation for that particular log happens.
You may potentially have many same log processors listening to the same log on each of your JanusGraph servers or just a single log processor if that’s what you are looking for.
You can use JanusGraph in embedded mode to open change processor OR you may send your groovy script using the remote connection. As you are using a remote connection, you may wish to connect to your JanusGraph server and send a groovy script which does exactly what documentation says (i.e. opens a log processor with your custom logic).
In case your server is secured against groovy scripts and you cannot send non-gremlin compliant scripts you may want to add your groovy script to the startup of GremlinServer (please, refer to TinkerPop documentation on how to attach your custom scripts to your servers).
16 replies
JJanusGraph
•Created by Aiman on 6/24/2024 in #questions
How do we generate transaction logs ?
I am not sure if you checked the documentation I posted above but that’s exactly what examples the documentation provides. I.e. any mutation operations (create / update / remove) is tracked by the transaction log.
There is also WAL available which tracks everything before transaction is committed or after transaction is committed.
Please, check the documentation and let us know if there are any sections which are confusing.
16 replies
JJanusGraph
•Created by Aiman on 6/24/2024 in #questions
How do we generate transaction logs ?
The documentation provides sample code. I’m not sure what exactly you are looking for as the documentation provides you both samples:
- how to register change processors
- how to build a transaction which uses your log message bus
16 replies
JJanusGraph
•Created by paull8147 on 6/25/2024 in #questions
Text predicate not serializable (containsPhrase, notContainsX, etc)
I haven’t looked at it myself, but I believe this is the related issue:
https://github.com/JanusGraph/janusgraph/issues/1565
Would be great if you can contribute this improvement into JanusGraph
5 replies
JJanusGraph
•Created by Aiman on 6/24/2024 in #questions
How do we generate transaction logs ?
@Aiman Sharief check documentation regarding Transaction Log here: https://docs.janusgraph.org/advanced-topics/transaction-log/
16 replies
JJanusGraph
•Created by HailDevil on 5/29/2024 in #questions
Java 17 support
No, there is no any timeline for Java 17 support. I don’t know if anyone is actually working on adding that support into JanusGraph. I think it’s not hard adding it, as we defined steps previously:
- dropping Java 8 support
- adding Java 17 support
However, I don’t know if there are any volunteers who are interested in this. Last time I believe @farodin91 was working on it.
12 replies
JJanusGraph
•Created by HailDevil on 5/29/2024 in #questions
Java 17 support
Yes. We have such a setup where we use JanusGraph inside Java 17 application. Storage backend CQL, mixed index backend ElasticSearch. Everything works as expected.
12 replies
JJanusGraph
•Created by HailDevil on 5/29/2024 in #questions
Java 17 support
JanusGraph supports Java 17 in embedded mode, but can’t be separately built using Java 17 as for now. There are plans to drop support for Java 8 and start supporting builds using Java 17
12 replies
ATApache TinkerPop
•Created by porunov on 5/18/2024 in #questions
Should `barrier` step merge Edge properties with the same key and value?
Thank you for the detailed description!
I believe LazyBarrierStrategy is not used for any traversal which has at least one “drop” step, but I might be wrong (have to check).
Internally it looks like bulking just increments the counter of equal traversers while storing only the first property.
The problem is that we actually use barrier steps as an optimisation for batch queries and this time we want to optimise “drop” step itself.
I believe in this case our best bet would be to make additional implementation of “NoOpBarrier” step which skips edge properties and meta properties and use that step instead of the standard “NoOpBarrierStep” for our batch optimisation strategy.
The related PR: https://github.com/apache/tinkerpop/pull/2612
7 replies
JJanusGraph
•Created by gdotv on 5/22/2024 in #questions
Creating a customer serializer Io Registry in Java
We do use custom serializers on the server side (to be able to pass data from JanusGraph to TinkerGraph and back), but I never played with custom serializers on the client driver side. I believe it should be possible like you described because we do use custom serializers for some type (like Geoshape for example just fine).
If you find a proper way to do that, please share it here (or directly into the documentation) so that we can document it once and for all.
Otherwise I will check this later and document it.
12 replies
JJanusGraph
•Created by Eathren on 5/16/2024 in #questions
Kill a query?
If you use gremlin server you can set “scriptEvaluationTime” to desired max evaluation time after which the query should be interrupted.
If you use any GLV or direct API then your query is executed your own managed thread. In such case you should interrupt your thread (‘yourThread.interrupt()’).
Notice, however, that of you are interrupting a mutation query with disabled storage atomicity you may potentially end up with partially mutated data (didn’t really test this case, so just guessing).
3 replies
ATApache TinkerPop
•Created by porunov on 5/18/2024 in #questions
Should `barrier` step merge Edge properties with the same key and value?
From “dedup” step documentation - Vertex properties are first class citizens. Thus, they are compared differently than “Edge” properties or “meta” properties.
Both “Edge” and “meta” properties are compared using “key” and “value” only. Meaning two properties from different edges which have the same “key” and “value” are considered to be equal.
Looking into the code I see that “dedup” and “barrier” steps are quite similar in the way they are comparing elements / properties.
The thing is that it’s not obvious (at least from user’s perspective) that “barrier” step acts like “dedup” step. As a user I would assume that “barrier” step only triggers computation of previous traversals but doesn’t change the final query result. The test above demonstrates that “barrier” step actually changes the final query result when used with the “drop” step.
I do understand why that happens (because of the merging optimisation happening inside the barrier step). However, it’s not clear to me if this was intended behaviour or not.
In either case I believe we should either improve documentation of the “barrier” step to let users know about such behaviour or we should change the way “barrier” step works.
7 replies
JJanusGraph
•Created by criminosis on 4/25/2024 in #questions
Phantom Unique Data / Data Too Large?
Hey @criminosis . Sorry I'm confused regarding what issues you are referencing to right now because it seems there are multiple problems you are facing.
I will reply regarding your last message, so that we can focus on a specific problem.
You can write data into Cassandra in 2 modes (with atomic transaction enabled or disabled). If you disable atomic transaction then your transaction may be split to multiple CQL batch requests of the configured size. Meaning that part of you data could be mutated and another part of your data may fail. With atomic transactions you won't need to deal with such issues, however it also means that all the mutations of your transaction should fit into a single batch request. In that case, all your mutations (no matter what batch size you configure) are going to be batched together which in some cases may result in very big batch requests. However, different storage backends (Cassandra, ScyllaDB, AstraDB, Amazon Keyspaces, etc.) may have their own batch size limitations. In some cases you can configure you servers to accept larger batch requests (i.e. in your own managed Cassandra or ScyllaDB), however in other cases you need to reach out to customer support team so that they can increase batch size limitation (i.e. AstraDB or Amazon Keyspaces managed clusters).
The reason why it could potentially fail in one environment but work in another is that this limit is applied to multi-partition statements. Meaning, in some situations your batch request may still be bigger while it is touching a single partition (a single vertex without index mutation). When you have a composite index mutation it already means that you are affecting multiple partitions (at least a sinlge partition for a vertex and a single partition for the index record).
Please, refer to
batch_size_fail_threshold_in_kb
in cassandra.yaml configuration for more information.13 replies