Speeding up Queries Made to JanusGraph
Hi, I am working with Janusgraph and my query is taking a while to execute (around 2.8 seconds), but I would like it to be faster. I read that I should create a composite index to improve speed and performance or something of that sort, but I am unfamiliar with how to do that in Python.
Here is my query: g.V().has("person", "name", "Bob").outE("knows").has("weight", P.gte(0.5)).inV().values("name").toList()
What my query does is it finds all the nodes that Bob has relationship "knows" with, as long as the weight of the edge to those nodes are >=0.5. Bob has around ~600 nodes that it's connected to with the "knows" relationship. It's fairly slow and takes 2.5-2.8 secs to complete.
I would like to speed this up greatly, any ideas?
(I originally posted this on the Gremlin server but was directed here)
17 Replies
👋🏻 Hey. Your traversals
g.V().has("person", "name", "Bob").outE("knows").has("weight", P.gte(0.5)).inV().values("name").toList()
could probably benefit from different indexes.
Without a Graph Index (https://docs.janusgraph.org/schema/index-management/index-performance/#graph-index), the first part g.V().has("person", "name", "Bob")
has to filter through all vertices to find the ones with property value "Bob".
Then, without a Vertex Centric Index (https://docs.janusgraph.org/schema/index-management/index-performance/#vertex-centric-indexes), JanusGraph needs to filter all of the matching vertices' "knows" out edges - .outE("knows")
- to find the ones with a weight >= 0.5 - .has("weight", P.gte(0.5))
.
I would suggest that you create a Graph Index first. Since your condition does an exact match on "Bob", you can use a Composite Index.
Indexes cannot be created from Python (AFAIK) but only through JanusGraph management interface.
We have sample commands in the doc showing how to create one: https://docs.janusgraph.org/schema/index-management/index-performance/#composite-indexI have a similar requirement. I understand that it is not possible to use openManagement in the python client, is a Java client the only way to set up indexing?
Technically, I think so (afaik). A grpc endpoint was added to JanusGraph 1.0 but does not yet support index management (https://github.com/JanusGraph/janusgraph/tree/master/janusgraph-grpc#todo-1).
But, it doesn't mean that you need to implement something yourself. You could start a Gremlin Console and update indexes from there using Groovy.
You can start Gremlin Console from a dedicated JanusGraph container or even JanusGraph server itself (https://docs.janusgraph.org/v0.3/basics/server/#connecting-to-gremlin-server)
You can also write a Groovy script and send that to the server for evaluation. That's of course also possible from Python
this sounds good, but in the long term as the graph grows, is it better to have a java client or this approach?
The problem is that I have very limited knowledge about java and I think the building process will get slower but at the same time looking at the APIs that are available with the java client like, for example, having a transaction like approach for graph operations looks much more robust.
You'll get automatic transaction management from JanusGraph Server if you just submit your traversals (= graph queries) from any language. JanusGraph Server will execute each traversal then in its own transaction.
So, manually managing transactions shouldn't be necessary
The general recommendation nowadays is mostly to stay away from these graph APIs you are mentioning and instead only use Gremlin to traverse the graph.
The graph API is mostly meant to be used by JanusGraph internally
So my recommendation is to stay in Python if that's the programming language that you typically use and have most experience in.
Gremlin Console is still a helpful tool to work interactively with your graph and for example try out new traversals before you implement them in Python where it's a bit harder to debug them if something doesn't directly work
Thank you for clarifying this
Thanks Florian for chiming in. I had overlooked that option!
Hello again, I was trying to send groovy scripts through python like you suggested.
I am trying this very basic setup where I creating a variable to store the groovy script and passing it through the client.
The above code runs and even returns the count correctly.
But when I just change the groovy script string to something like:
I am getting a serialization error. Is there something I am missing.
is it to do something with the way I am setting up the janusgraph's graph properties?
Sorry if this is a very basic question but I have tried looking for better solutions and couldn't find any.
Oh, yes, this is a bit confusing. So, types like the
ManagementSystem
cannot be serialized and therefore not be returned from the server to the client. However, that is also not necessary.
So, you just need to make sure that the server won't try to return such a type. The server will simply try to return the result of the last line of your Groovy script. If that's mgmt = graph.openManagement()
, then it will try to return mgmt
.
The easiest workaround is to end your script in []
to let the server just return an empty array which you can then ignore on the client
You can of course also return something else if you want to use that to determine whether your script was executed successfully. Just make sure that it's a type that the server can serialize and your client deserialize, like any primitive data type (int, string, etc.)Thank you for this. It works.
Just wanted to clarify, the script that I pass can be anything that groovy allows right? I can define functions and variables inside that script and pass that?
I am currently using opensearch as the index backend. I will be able to see the indices in the opensearch dashboard if they are created right?
I just want to know the flow of the things here.
All this is brand new for me 😬
Just wanted to clarify, the script that I pass can be anything that groovy allows right? I can define functions and variables inside that script and pass that?Yes We're also doing the same at my company. We have a big auto generated schema creation script which contains functions for things like creating a vertex label with properties, creating an edge label, and so on
I will be able to see the indices in the opensearch dashboard if they are created right?I haven't used opensearch myself, but in general yes, you can see mixed indices created by JanusGraph also directly in your index backend. (Note the "mixed" here as composite indices are not backed by the index backend)
Perfect. Thank you so much for you help!
Hello! This seems like a real catch so I was thinking of creating a PR to update the docs to mention this.
Also, the groovy script approach is only mentioned in the tinkerpop documentation: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python
and nowhere in the janusgraph documentation, does it make sense to have it in janusgraph documentation as well?
Sure, that definitely makes sense. Schema management is specific to JanusGraph so we should document how that can be done which also includes the different languages
So I'm thinking I will make the following change to the tinkerpop documentation:
and I will also add an example for the code. Will that be fine?
Under Gremlin-Python > Submitting Scripts ? Sounds good to me.
Ideally, you first create an issue with the TinkerPop project: https://issues.apache.org/jira/browse/TINKERPOP/
However in this case, since it's just a short addition to the docs, you could probably also skip that if you want to.
The idea generally is that it means that the change will be included in the CHANGELOG and it allows others to provide feedback, ideally before you put in any effort
Yes, under the Gremlin-Python -> Submitting Scripts section.
I will create an issue first then, no problem. Thank you for letting me know.
Just letting you know, I have created the issue here: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=499&projectKey=TINKERPOP&view=detail&selectedIssue=TINKERPOP-3111