Fulltext-search-like features without ElasticSearch, OpenSearch, Solr and such?
I've read in multiple sources that Apache TinkerPop isn't optimized for text search operations like partial string matching or Regex matching.
A common "solution" seems to involve integrating the database with fulltext search engines like ElasticSearch or Solr.
Is there another way of handling these kind of operations without adding another tool? I'm afraid this is getting way more complex than I wanted.
Just some context, what I'm trying to do is filter nodes by one of their properties called
legal_name
, some similar to SQL
SELECT * FROM customers WHERE legal_name LIKE '%John%'
, the query itself is of course more complex than that, but that Step is making it really nonperformant.15 Replies
There's an ongoing effort to add Couchbase (a storage engine that supports full-text search) to JanusGraph: https://github.com/JanusGraph/janusgraph/pull/4086
GitHub
4084: adds Couchbase as JanusGraph backend by chedim · Pull Request...
Issue #4084
This PR adds couchbase JanusGraph backend and search. The backend is in alfa stage and is not yet recommended for production use.
All the dependencies for the backend are either already...
that would make it support fulltext search right out of the box?
seems so
TinkerPop, by itself, is a framework. So the provider that implements the framework would need to implement things such as text search indexing. That being said, there was an addition made to TinkerPop 3.6 to provide extensions in the form of
call()
steps. https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_call
https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/service/Service.java#L41-L51
This likely gets overlooked, as the documentation for this is pretty light. There is, however, a reference implementation of implementing a regex based search by creating a "service" and using the related call()
step:
https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/services/TinkerTextSearchFactory.java
You could use that as a the basis for creating a service that makes a remote call to something like OpenSearch.GitHub
tinkerpop/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gr...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
GitHub
tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/s...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
I think that some of the published information out there from previous years might have been satisfied by Gremlin having native regex support. This was added in 3.6.0 - https://tinkerpop.apache.org/docs/current/upgrade/#_textp_regex - that feature might satisfy some text search use cases.
Yes. Maybe a need for some better examples for Service Registry.
I'm currently using TextP.Regex for my queries (as well as "startsWith", "containing" etc), but it absolutely kills the performance of the query, and this seems to be one of the most common reasons why people go and integrate it with something like ElasticSearch
well this still leads to me having to integrate my DB with another tool, which is exactly what I was trying to avoid.
I was thinking about this recently as well for attempting to implement a fuzzy search on a name... I haven't fully fleshed out exactly how it would work (or if it would work at all) but essentially, I wondered if it could be accomplished by creating a separate vertex for each letter in the
legal_name
with a CONTAINS_LETTER
Edge (and maybe a positional property on the edge?)
My thought was that given an input string (a name, in this example) you could use a repeat()
step until()
some pre-defined match criteria were met
*Edit - I'm a relative Gremlin newbie, so forgive me if that makes zero sense!the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context. theoretically, i think you could model search the way you describe, but it does create a lot of extra infrastructure in your graph which might have performance/space/administrative implications.
the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context.Haha cannot agree more. Everything that sits in the old RDBMS space can be reinvented in graph universe.
i've even infected my children. they see stuff randomly in the world and are like, "whoa, that's a graph"
A quite a bit of side but I have been working extensively on Google Firebase Firestore lately. The same thing. no case insensitive search what-so-ever. Their suggestion - buy an index service. Not wanting that I just parse out words, lower case them and store them in an array property. Latest gremlin supports RegEx perdicate thoough now, so that's a lot better today.
@Gil I used TextP longtime ago but then it started to be slower with increased data size. So i integrated elasticsearch
Engine: Neptune AWS
But today i am facing a new problem, it fails on retrieving something with hyphens
Is that failing on the OpenSearch side, or from the call from Neptune?
by fail i meant fail to retrieve results, it brings empty output.