Fulltext-search-like features without ElasticSearch, OpenSearch, Solr and such?

I've read in multiple sources that Apache TinkerPop isn't optimized for text search operations like partial string matching or Regex matching. A common "solution" seems to involve integrating the database with fulltext search engines like ElasticSearch or Solr. Is there another way of handling these kind of operations without adding another tool? I'm afraid this is getting way more complex than I wanted. Just some context, what I'm trying to do is filter nodes by one of their properties called legal_name, some similar to SQL SELECT * FROM customers WHERE legal_name LIKE '%John%', the query itself is of course more complex than that, but that Step is making it really nonperformant.
15 Replies
Bo
Bo9mo ago
There's an ongoing effort to add Couchbase (a storage engine that supports full-text search) to JanusGraph: https://github.com/JanusGraph/janusgraph/pull/4086
GitHub
4084: adds Couchbase as JanusGraph backend by chedim · Pull Request...
Issue #4084 This PR adds couchbase JanusGraph backend and search. The backend is in alfa stage and is not yet recommended for production use. All the dependencies for the backend are either already...
Gil
GilOP9mo ago
that would make it support fulltext search right out of the box?
Bo
Bo9mo ago
seems so
triggan
triggan8mo ago
TinkerPop, by itself, is a framework. So the provider that implements the framework would need to implement things such as text search indexing. That being said, there was an addition made to TinkerPop 3.6 to provide extensions in the form of call() steps. https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_call https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/service/Service.java#L41-L51 This likely gets overlooked, as the documentation for this is pretty light. There is, however, a reference implementation of implementing a regex based search by creating a "service" and using the related call() step: https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/services/TinkerTextSearchFactory.java You could use that as a the basis for creating a service that makes a remote call to something like OpenSearch.
GitHub
tinkerpop/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gr...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
GitHub
tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/s...
Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.
spmallette
spmallette8mo ago
I think that some of the published information out there from previous years might have been satisfied by Gremlin having native regex support. This was added in 3.6.0 - https://tinkerpop.apache.org/docs/current/upgrade/#_textp_regex - that feature might satisfy some text search use cases.
triggan
triggan8mo ago
Yes. Maybe a need for some better examples for Service Registry.
Gil
GilOP8mo ago
I'm currently using TextP.Regex for my queries (as well as "startsWith", "containing" etc), but it absolutely kills the performance of the query, and this seems to be one of the most common reasons why people go and integrate it with something like ElasticSearch well this still leads to me having to integrate my DB with another tool, which is exactly what I was trying to avoid.
dmcmanus
dmcmanus8mo ago
I was thinking about this recently as well for attempting to implement a fuzzy search on a name... I haven't fully fleshed out exactly how it would work (or if it would work at all) but essentially, I wondered if it could be accomplished by creating a separate vertex for each letter in the legal_name with a CONTAINS_LETTER Edge (and maybe a positional property on the edge?) My thought was that given an input string (a name, in this example) you could use a repeat() step until() some pre-defined match criteria were met *Edit - I'm a relative Gremlin newbie, so forgive me if that makes zero sense!
spmallette
spmallette8mo ago
the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context. theoretically, i think you could model search the way you describe, but it does create a lot of extra infrastructure in your graph which might have performance/space/administrative implications.
Bo
Bo8mo ago
the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context.
Haha cannot agree more. Everything that sits in the old RDBMS space can be reinvented in graph universe.
spmallette
spmallette8mo ago
i've even infected my children. they see stuff randomly in the world and are like, "whoa, that's a graph"
ManabuBeach
ManabuBeach8mo ago
A quite a bit of side but I have been working extensively on Google Firebase Firestore lately. The same thing. no case insensitive search what-so-ever. Their suggestion - buy an index service. Not wanting that I just parse out words, lower case them and store them in an array property. Latest gremlin supports RegEx perdicate thoough now, so that's a lot better today.
M. alhaddad
M. alhaddad8mo ago
@Gil I used TextP longtime ago but then it started to be slower with increased data size. So i integrated elasticsearch Engine: Neptune AWS But today i am facing a new problem, it fails on retrieving something with hyphens
triggan
triggan8mo ago
Is that failing on the OpenSearch side, or from the call from Neptune?
M. alhaddad
M. alhaddad7mo ago
by fail i meant fail to retrieve results, it brings empty output.
Want results from more Discord servers?
Add your server