JanusGraph 1.0 full-text search predicate in python - broken
Hi All,
with JanusGraph 0.6 and gremlin-python 3.5.4, I was able to use the following in Python to use JanusGraph full-text search predicate:
-----
from gremlin_python.process.traversal import P
g.V().has('firstName', P('textRegex', 'john')).toList()
-----
with JanusGraph 1.0 and gremlin-python 3.7.0, I am getting the following error:
Received error message '{'requestId': 'None', 'status': {'code': 499, 'message': 'Invalid OpProcessor requested [null]', 'attributes': {}}, 'result': {'meta': {}, 'data': None}}'
Could you please suggest any workaround how to make the above query work on JanusGraph 1.0 and gremlin-python 3.7.0? Thank you.
Solution:Jump to solution
The problem here is probably that JanusGraph used to serialize its text predicates as if they were TinkerPop text predicates, just with a value corresponding to the value of the JanusGraph text predicate, e.g.,
TextP.textContains()
was serialized as if it were P.textContains()
.
That was changed in version 0.6.0 of JanusGraph to let JanusGraph serialize its predicates with a JanusGraph specific type identifier, but the server kept a fallback mechanism so it could still deserialize predicates sent that way: https://docs.janusgraph.org/changelog/#serialization-of-janusgraph-predicates-has-changed
This fallback mechanism was then removed in JanusGraph 1.0.0: https://docs.janusgraph.org/changelog/#remove-support-for-old-serialization-format-of-janusgraph-predicates
...9 Replies
Could you please also paste the server-side logs/errors?
Sure. Please find enclosed. Note that I am using GraphSONSerializersV3d0 as I faced some other issues (unrelated to this) with the GraphBinaryMessageSerializerV1. Thank you.
Yeah GraphBinary serializer doesn't work with python driver
https://tinkerpop.apache.org/docs/current/upgrade/#_textp_regex would make it work, but the regex match will not leverage your index backend
Text.textRegex
will leverage your indexing backend, but that is a JanusGraph-specific predicate and thus not supported by the python-gremlin packageSolution
The problem here is probably that JanusGraph used to serialize its text predicates as if they were TinkerPop text predicates, just with a value corresponding to the value of the JanusGraph text predicate, e.g.,
TextP.textContains()
was serialized as if it were P.textContains()
.
That was changed in version 0.6.0 of JanusGraph to let JanusGraph serialize its predicates with a JanusGraph specific type identifier, but the server kept a fallback mechanism so it could still deserialize predicates sent that way: https://docs.janusgraph.org/changelog/#serialization-of-janusgraph-predicates-has-changed
This fallback mechanism was then removed in JanusGraph 1.0.0: https://docs.janusgraph.org/changelog/#remove-support-for-old-serialization-format-of-janusgraph-predicates
Our assumption was that it would only affect users who would use a JanusGraph driver with a version older than 0.6.0, connecting to a server on 1.0.0. This is of course not a supported config in general, so removing that fallback didn't seem like a problem.
We unfortunately didn't expect users to rely on this mechanism by creating a TinkerPop P
predicate themselves with a JanusGraph predicate value like you are doing hereYou can probably implement a simple GraphSON serializer for JanusGraph predicates yourself in Python to get this to work again.
For reference, here you can see how gremlin-python serializes its own predicates (this mainly shows how to build a serializer in Python): https://github.com/apache/tinkerpop/blob/6d17c674c81d4d3dc44a8a51950bc90f3d97633f/gremlin-python/src/main/python/gremlin_python/structure/io/graphsonV3d0.py#L273
and here you can see how JanusGraph predicates should be serialized (in this case implemented in C#): https://github.com/JanusGraph/janusgraph-dotnet/blob/master/src/JanusGraph.Net/IO/GraphSON/JanusGraphPSerializer.cs
GitHub
janusgraph-dotnet/src/JanusGraph.Net/IO/GraphSON/JanusGraphPSeriali...
JanusGraph .NET Gremlin Language Variant (GLV). Contribute to JanusGraph/janusgraph-dotnet development by creating an account on GitHub.
GitHub
tinkerpop/gremlin-python/src/main/python/gremlin_python/structure/i...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
Ultimately, we need a small Python library for JanusGraph which implements serializers for JanusGraph specific types to extend gremlin-python, similar to JanusGraph.Net for .NET. Would be great if we could find someone willing to tackle that 🙂
Thanks a lot Boxuan and Florian for looking into this, I very much appreciate your help.
With JanusGraph 0.6 and gremlin-python 3.5.4, I am sure that I was able to use all full-text search predicates and those utilized the index backend. Just for reference, here is a small example:
* g.V().has('firstName', P('textPrefix', 'John')).toList() -- this version used index backend
* g.V().has('firstName', TextP.startingWith('John')).toList() -- with this version I got the warning in the JG log like "Query requires iterating over all vertices [()]. For better performance, use indexes"
Probably, the above "textPrefix" (and other JG full text search predicates) worked for me due to the fallback mechanism mentioned by Florian.
If I understand correctly, even with JG 0.6, there was no "official" way of using JanusGraph predicates with gremlin-python (only the fallback mechanism I used), right?
I see that the nicest way to move forward would be to implement a Python library for JanusGraph which implements serializers for JanusGraph specific types. Do I understand correctly that the only benefit of creating and maintaining of such Python library would be to be able to use the JanusGraph specific full text predicates in Python?
I don't know how complicated was the "fallback mechanism" that was removed in JG 1.0. If not that complicated, would it be an option to add back the "fallback mechanism" so JG full text search predicates could be used with Gremlin-python without the need of writing/maintaining a dedicated Python library? I see that there was an attempt to probably do something similar 5 years ago (https://github.com/JanusGraph/janusgraph-python) but based on that I would suspect that not a lot of people is interested in using JanusGraph with python.
Please let me know your thoughts on the drawbacks/benefits between adding back "fallback mechanism" to JG 1.0 vs. writing/maintaining a JanusGraph specific python serializer.
Thanks a lot for your help and time.
Probably, the above "textPrefix" (and other JG full text search predicates) worked for me due to the fallback mechanism mentioned by Florian.Yes, exactly. It meant that JanusGraph Server treated this as a JanusGraph predicate which allowed it to use the index. We didn't implement a mechanism for JanusGraph to use an index when TinkerPop's text predicates are used. That explains why you got the warning for
TextP.startingWith()
We should however also implement such a mechanism as it would make the TinkerPop text predicates more useful for JanusGraph users and it would make our custom text predicates less important
If I understand correctly, even with JG 0.6, there was no "official" way of using JanusGraph predicates with gremlin-python (only the fallback mechanism I used), right?That is correct. Hence the hint in the docs: https://docs.janusgraph.org/basics/connecting/python/#janusgraph-specific-types-and-predicates
Do I understand correctly that the only benefit of creating and maintaining of such Python library would be to be able to use the JanusGraph specific full text predicates in Python?The text predicates are not the only custom JanusGraph types. Other examples are Geoshape predicates and Geo types in general (like for coordinates), but much more important is the
RelationIdentifier
.
You can't really work with edges in Python right now without a serializer for RelationIdentifier
. It currently only works with GraphSON as that seems to be very forgiving in Python.
Since GraphBinary is however now the recommended serialization format from TinkerPop, we should definitely not rely on that completely
Regarding the fallback mechanism: It would make the JanusGraph serialization code more complicated in Java, but more importantly than that: It will only help for GraphSON as I've just mentioned. That's why I don't think that it's worth it to bring this fallback mechanism back
Writing a very minimal Python library which can initially only support Text predicates and RelationIdentifier
is really not that much work. The problem with janusgraph-python years ago was that the contributor who tried that directly wanted to implement a much bigger library with the first PR already and that made the review process quite complexThanks a lot Florian for all the answers and clarification.