JanusGraph

J

JanusGraph

Join the community to ask questions about JanusGraph and get answers from other members.

Join

Running OLAP queries on Janusgraph outside the Gremlin Console (from Java and G.V())

Hi, I'm able to run OLAP queries against my graph DB from the Gremlin Console, by following the directions provided here: https://docs.janusgraph.org/advanced-topics/hadoop/ However, I would like to also run OLAP queries without using the console, from an embedded Janusgraph Java application as well as from G.V(). In G.V(), I tried this while selecting Groovy Mode for query submission:...

Support for saving arrays of vectors

Hi, I know that JausGraph by default does not support read / write of arrays or vectors of floating point numbers. Is there a reason why?

Why queries are slow when more than one mixed index in query

ConfiguredGraphFactory.open("tenant51").traversal().V().has('_t', 'infra:container').has('_it', gt(123)).limit(100) is taking 400ms while ConfiguredGraphFactory.open("tenant51").traversal().V().has('_it', gt(123)).limit(100) or ConfiguredGraphFactory.open("tenant51").traversal().V().has('t', 'infra:container').limit(100) ...
No description

Index Creation Help

I need some help understanding the difference between Graph Index, Composite Index, and Vertex Centrix Index, and how to create them. I am currently working in Python but I am unsure on how to utilize these three different types of indices to speed up my queries, any suggestions/explanations would be greatly appreciated Looking at this page from a previous thread that I had created here (https://docs.janusgraph.org/schema/index-management/index-performance/#composite-index) I wasn't too sure where to execute some of the commands listed on the website. Is this going to work with Python?...

Transaction Recovery not working as expected

Hi everyone. I'm following the steps in the Transaction Failure section here Failure & Recovery - JanusGraph for handling when persistence to indexing backends fail. I've enabled the tx write-ahead log and have tried setting up the recovery process. Based on the logs, it seems that the process is initializing properly but I don't see much after that and I don't believe indexing is being retried based on queries using this index. I also see that getStatistics returns 0,0. I'm wondering if anyone had any insight into what's going on here/what I might be missing? We're using cassandra as our backend storage and lucene for indexing at the moment. Thank you!...

Mapping.STRING not working as expected?

Hi everyone! Based on the Janusgraph text search documentation: ``` When a string mapping is configured, the string value is indexed and can be queried "as-is" - including stop words and non-letter characters...

Problem with Custom Long IDs in JanusGraph

Hi, I'm experiencing issues with using custom Long IDs in JanusGraph. Although I'm aware of the limitations concerning signed long integers, I'm still facing problems with the range of IDs that can be utilized. Currently, I'm operating with two Cassandra machines as backend storage and a single JanusGraph machine. My database needs to handle at least 3 billion vertex nodes with unique IDs, and I'm trying to determine if this is feasible. I've experimented with various configuration settings, such as cluster.max-partitions and ids.authority.conflict-avoidance-bitwidth, attempting to expand the number of bits available for IDs, but without success....

Gremlin Console in v1.0.0 and v1.1.0

A weird behaviour i am facing in v1.0.0 and v1.1.0 v1.0.0 ``` gremlin> :remote connect tinkerpop.server conf/remote.yaml session...

Query for JANUSGRAPH_RELATION_DELIMITER

I have a use case in which i need to create multiple instances of janusgraph in a single service, and these instances are using different JANUSGRAPH_RELATION_DELIMITER. I have gone through the source code and found the class RelationIdentifier.java, where I can see that the property JANUSGRAPH_RELATION_DELIMITER is read from the environment variable and not from the configurationBuilder. Is this the only way to provide the delimiter variable? If no, then can you provide a workaround for this? ...

Custom Vertex ID and coalesce

I am trying to find a vertex by ID if it exists update a property otherwise create a vertex with the ID. `ConfiguredGraphFactory.open("tenant51").traversal(). V().hasId('45gjttOlN2+udTmQcJnHpp').fold() .coalesce(...

Can `CqlInputFormat` do predicate pushdowns/query based prefilters?

Hi! First of all, thank you all for your work on JanusGraph. In my use case, I have a medium-large graph, ~3TB currently, might be 1-2 orders of magnitude bigger later. The data in it is generally clustered in a time-based fashion, e.g. newer vertices are mostly connected to other newer vertices (a timestamp is stored as a vertex property). I am writing an OLAP pipeline with Spark where JanusGraph, backed by Cassandra, is the source, and I use Tinkerpop's hadoop-gremlin to build vertex programs and run OLAP gremlin queries. Per my understanding, in this setup the only point of contact with JanusGraph is through the CqlInputFormat and the server itself is not involved at all. Is that correct?...

Performance issues with bulk loading

We've a JanusGraph cluster with Cassandra as the storage backend. Our cluster is deployed into a AWS EKS cluster. We've 32 JanusGraph pods with 2 cores and 16 GB and 9 Cassandra pods with 8 cores and 36 GB. We're using AWS Lambda and gremlin python library to load data in parallel. We've 20 concurrent Lambda invocations. We're loading data anywhere between 1 million to 20 million nodes (and at least that many edges if not more, hard to predict) in each run. Each Lambda invocation adds 500 nodes and associated relationships. The time it takes to load 500 nodes goes up as the load progresses especially when the number of nodes is closer to 20 million. A AWS Lambda can run for a maximum of 15 minutes. Several AWS Lambda invocations timeout especially towards the end of load. What can we do to improve the performance of the data loading. We've to load about 1 billions nodes and associated relationships. Our JanusGraph and Cassandra config is mostly default. Thanks....

When using janusgraph text predicates for fuzzy search, is it possible to control the fuzziness?

I am using text_contains_fuzzy method from janusgraph python library but I can't tell from the types whether the function accepts anything apart from the string value

Migration options for custom vertex id.

We are planning to use custom vertex id. I have huge data set that is already ingested with default vertex id and now I wanted to use custom vertex id. Any suggestion how can I migrate from default vertex id to custom vertex id ?

Best practices recommendation for Kotlin?

What, if any, are best practices recommendations when using the Java API from a Kotlin coroutine environment? Are there plans to support an asynchronous Java client in the future?

Edge cardinality SIMPLE documentation not clear

I am using edge cardinality as SIMPLE in my graph, and according to documentation: SIMPLE: Allows at most one edge of such label between any pair of vertices. In other words, the graph is a simple graph with respect to the label. Ensures that edges are unique for a given label and pairs of vertices. My question is, is direction considered here? Means can I have an edge A->B and B->A?...

ConfigurationManagementGraph fails with Must provide vertex id

I am new to the JanusGraph. Trying to create dynamic graphs. I couldn't create configuration template. Am I missing anything. Thanks My config: ` graphManager: org.janusgraph.graphdb.management.JanusGraphManager graphs: {...

Janusgraph bigtable rows exceeds the limit 256MiB when exported via Dataflow in Parquet format

Hi team, Currently, we are using Janusgraph with Bigtable as the storage backend. And we wanted to export the data out of Bigtable using Dataflow in a Parquet format to cloud storage. But during the process it failed because some of the rows size too large that exceeds the limit with the following error messages: See attachment...

custom vertex id (String) feature to avoid duplicate vertex

I have a graph(Backed with Cassandra and read/write consistency QUORUM) which will have Vertex property "recordId" and "type". I have disable consistency locking on property say "recordId" in my graph, I see duplicate vertex getting created for same "recordId", due to concurrent writes. Note - In above case we were not providing any custom vertex id, but relying in Janusgraph to generate vertex id. ...

Need some advice on using Edge Indexes efficiently.

I have a usecase where - I have to find a source vertex - From the source vertex, I need to find the edges that match certain filters. To find the source vertex, I can use a Graph Index(mixed index)....
Next