Speeding up Queries Made to JanusGraph

Hi, I am working with Janusgraph and my query is taking a while to execute (around 2.8 seconds), but I would like it to be faster. I read that I should create a composite index to improve speed and performance or something of that sort, but I am unfamiliar with how to do that in Python. Here is my query: g.V().has("person", "name", "Bob").outE("knows").has("weight", P.gte(0.5)).inV().values("name").toList() What my query does is it finds all the nodes that Bob has relationship "knows" with, as long as the weight of the edge to those nodes are >=0.5. Bob has around ~600 nodes that it's connected to with the "knows" relationship. It's fairly slow and takes 2.5-2.8 secs to complete. I would like to speed this up greatly, any ideas? (I originally posted this on the Gremlin server but was directed here)
18 Replies
cdegroc
cdegroc4mo ago
👋🏻 Hey. Your traversals g.V().has("person", "name", "Bob").outE("knows").has("weight", P.gte(0.5)).inV().values("name").toList() could probably benefit from different indexes. Without a Graph Index (https://docs.janusgraph.org/schema/index-management/index-performance/#graph-index), the first part g.V().has("person", "name", "Bob") has to filter through all vertices to find the ones with property value "Bob". Then, without a Vertex Centric Index (https://docs.janusgraph.org/schema/index-management/index-performance/#vertex-centric-indexes), JanusGraph needs to filter all of the matching vertices' "knows" out edges - .outE("knows") - to find the ones with a weight >= 0.5 - .has("weight", P.gte(0.5)). I would suggest that you create a Graph Index first. Since your condition does an exact match on "Bob", you can use a Composite Index. Indexes cannot be created from Python (AFAIK) but only through JanusGraph management interface. We have sample commands in the doc showing how to create one: https://docs.janusgraph.org/schema/index-management/index-performance/#composite-index
karthikraju
karthikraju4mo ago
I have a similar requirement. I understand that it is not possible to use openManagement in the python client, is a Java client the only way to set up indexing?
cdegroc
cdegroc4mo ago
Technically, I think so (afaik). A grpc endpoint was added to JanusGraph 1.0 but does not yet support index management (https://github.com/JanusGraph/janusgraph/tree/master/janusgraph-grpc#todo-1). But, it doesn't mean that you need to implement something yourself. You could start a Gremlin Console and update indexes from there using Groovy. You can start Gremlin Console from a dedicated JanusGraph container or even JanusGraph server itself (https://docs.janusgraph.org/v0.3/basics/server/#connecting-to-gremlin-server)
# Start gremlin console
$ ./bin/gremlin.sh

# Connect to a remote JanusGraph server (configured in /etc/remote.yaml in this case)
gremlin> :remote connect tinkerpop.server /etc/remote.yaml session

# Enter remote console mode and send all commands to the server
gremlin> :remote console

# Now you can access open management
...
# Start gremlin console
$ ./bin/gremlin.sh

# Connect to a remote JanusGraph server (configured in /etc/remote.yaml in this case)
gremlin> :remote connect tinkerpop.server /etc/remote.yaml session

# Enter remote console mode and send all commands to the server
gremlin> :remote console

# Now you can access open management
...
Florian Hockmann
You can also write a Groovy script and send that to the server for evaluation. That's of course also possible from Python
karthikraju
karthikraju4mo ago
this sounds good, but in the long term as the graph grows, is it better to have a java client or this approach? The problem is that I have very limited knowledge about java and I think the building process will get slower but at the same time looking at the APIs that are available with the java client like, for example, having a transaction like approach for graph operations looks much more robust.
Florian Hockmann
You'll get automatic transaction management from JanusGraph Server if you just submit your traversals (= graph queries) from any language. JanusGraph Server will execute each traversal then in its own transaction. So, manually managing transactions shouldn't be necessary The general recommendation nowadays is mostly to stay away from these graph APIs you are mentioning and instead only use Gremlin to traverse the graph. The graph API is mostly meant to be used by JanusGraph internally So my recommendation is to stay in Python if that's the programming language that you typically use and have most experience in. Gremlin Console is still a helpful tool to work interactively with your graph and for example try out new traversals before you implement them in Python where it's a bit harder to debug them if something doesn't directly work
karthikraju
karthikraju4mo ago
Thank you for clarifying this
cdegroc
cdegroc4mo ago
Thanks Florian for chiming in. I had overlooked that option!
karthikraju
karthikraju4mo ago
Hello again, I was trying to send groovy scripts through python like you suggested. I am trying this very basic setup where I creating a variable to store the groovy script and passing it through the client.
from gremlin_python.driver import client

client = client.Client('ws://localhost:8182/gremlin', 'g')

index_creation = '''
g.V().count().toList();
'''

result_set = client.submit(index_creation)
future_results = result_set.all()
results = future_results.result()
print(results)

client.close()
from gremlin_python.driver import client

client = client.Client('ws://localhost:8182/gremlin', 'g')

index_creation = '''
g.V().count().toList();
'''

result_set = client.submit(index_creation)
future_results = result_set.all()
results = future_results.result()
print(results)

client.close()
The above code runs and even returns the count correctly. But when I just change the groovy script string to something like:
mgmt = graph.openManagement()
mgmt = graph.openManagement()
I am getting a serialization error. Is there something I am missing.
gremlin_python.driver.protocol.GremlinServerError: 599: Error during serialization: Serializer for type org.janusgraph.graphdb.database.management.ManagementSystem not found
gremlin_python.driver.protocol.GremlinServerError: 599: Error during serialization: Serializer for type org.janusgraph.graphdb.database.management.ManagementSystem not found
is it to do something with the way I am setting up the janusgraph's graph properties? Sorry if this is a very basic question but I have tried looking for better solutions and couldn't find any.
Florian Hockmann
Oh, yes, this is a bit confusing. So, types like the ManagementSystem cannot be serialized and therefore not be returned from the server to the client. However, that is also not necessary. So, you just need to make sure that the server won't try to return such a type. The server will simply try to return the result of the last line of your Groovy script. If that's mgmt = graph.openManagement(), then it will try to return mgmt. The easiest workaround is to end your script in [] to let the server just return an empty array which you can then ignore on the client You can of course also return something else if you want to use that to determine whether your script was executed successfully. Just make sure that it's a type that the server can serialize and your client deserialize, like any primitive data type (int, string, etc.)
karthikraju
karthikraju4mo ago
Thank you for this. It works. Just wanted to clarify, the script that I pass can be anything that groovy allows right? I can define functions and variables inside that script and pass that? I am currently using opensearch as the index backend. I will be able to see the indices in the opensearch dashboard if they are created right? I just want to know the flow of the things here. All this is brand new for me 😬
Florian Hockmann
Just wanted to clarify, the script that I pass can be anything that groovy allows right? I can define functions and variables inside that script and pass that?
Yes We're also doing the same at my company. We have a big auto generated schema creation script which contains functions for things like creating a vertex label with properties, creating an edge label, and so on
I will be able to see the indices in the opensearch dashboard if they are created right?
I haven't used opensearch myself, but in general yes, you can see mixed indices created by JanusGraph also directly in your index backend. (Note the "mixed" here as composite indices are not backed by the index backend)
karthikraju
karthikraju4mo ago
Perfect. Thank you so much for you help! Hello! This seems like a real catch so I was thinking of creating a PR to update the docs to mention this. Also, the groovy script approach is only mentioned in the tinkerpop documentation: https://tinkerpop.apache.org/docs/current/reference/#gremlin-python and nowhere in the janusgraph documentation, does it make sense to have it in janusgraph documentation as well?
Florian Hockmann
Sure, that definitely makes sense. Schema management is specific to JanusGraph so we should document how that can be done which also includes the different languages
karthikraju
karthikraju4mo ago
So I'm thinking I will make the following change to the tinkerpop documentation:
When the response type is known to be something other than primitive datatypes, it is recommended to return an empty array at the end of the script so that the script executes despite having an arbitrary return type
When the response type is known to be something other than primitive datatypes, it is recommended to return an empty array at the end of the script so that the script executes despite having an arbitrary return type
and I will also add an example for the code. Will that be fine?
Florian Hockmann
Under Gremlin-Python > Submitting Scripts ? Sounds good to me. Ideally, you first create an issue with the TinkerPop project: https://issues.apache.org/jira/browse/TINKERPOP/ However in this case, since it's just a short addition to the docs, you could probably also skip that if you want to. The idea generally is that it means that the change will be included in the CHANGELOG and it allows others to provide feedback, ideally before you put in any effort
karthikraju
karthikraju4mo ago
Yes, under the Gremlin-Python -> Submitting Scripts section. I will create an issue first then, no problem. Thank you for letting me know. Just letting you know, I have created the issue here: https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=499&projectKey=TINKERPOP&view=detail&selectedIssue=TINKERPOP-3111
b4lls4ck
b4lls4ckOP4w ago
Hi everyone , thank you for responding, may someone please explain how to create an index in Python? This is assuming that creating an index will speed up performance of queries Hi @cdegroc Thank you so much for your response, may you please explain how someone can go about creating a Composite index? What is the difference between Graph Index, Vertex Centric Index, and Composite Index? I took a look at the link you sent, should the commands be run in a groovy console?
Want results from more Discord servers?
Add your server