Apache TinkerPop•13mo ago

Fulltext-search-like features without ElasticSearch, OpenSearch, Solr and such?

I've read in multiple sources that Apache TinkerPop isn't optimized for text search operations like partial string matching or Regex matching. A common "solution" seems to involve integrating the database with fulltext search engines like ElasticSearch or Solr. Is there another way of handling these kind of operations without adding another tool? I'm afraid this is getting way more complex than I wanted. Just some context, what I'm trying to do is filter nodes by one of their properties called legal_name, some similar to SQL SELECT * FROM customers WHERE legal_name LIKE '%John%', the query itself is of course more complex than that, but that Step is making it really nonperformant.

15 Replies

Bo•13mo ago

There's an ongoing effort to add Couchbase (a storage engine that supports full-text search) to JanusGraph: https://github.com/JanusGraph/janusgraph/pull/4086

GitHub

4084: adds Couchbase as JanusGraph backend by chedim · Pull Request...

Issue #4084 This PR adds couchbase JanusGraph backend and search. The backend is in alfa stage and is not yet recommended for production use. All the dependencies for the backend are either already...

GilOP•13mo ago

that would make it support fulltext search right out of the box?

Bo•13mo ago

seems so

triggan•13mo ago

TinkerPop, by itself, is a framework. So the provider that implements the framework would need to implement things such as text search indexing. That being said, there was an addition made to TinkerPop 3.6 to provide extensions in the form of call() steps. https://tinkerpop.apache.org/docs/3.6.0/dev/provider/#_call https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/structure/service/Service.java#L41-L51 This likely gets overlooked, as the documentation for this is pretty light. There is, however, a reference implementation of implementing a regex based search by creating a "service" and using the related call() step: https://github.com/apache/tinkerpop/blob/d174572f3fa3d8ff01e628dab18493e13359a632/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gremlin/tinkergraph/services/TinkerTextSearchFactory.java You could use that as a the basis for creating a service that makes a remote call to something like OpenSearch.

GitHub

tinkerpop/tinkergraph-gremlin/src/main/java/org/apache/tinkerpop/gr...

Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.

GitHub

tinkerpop/gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/s...

Apache TinkerPop - a graph computing framework. Contribute to apache/tinkerpop development by creating an account on GitHub.

spmallette•13mo ago

I think that some of the published information out there from previous years might have been satisfied by Gremlin having native regex support. This was added in 3.6.0 - https://tinkerpop.apache.org/docs/current/upgrade/#_textp_regex - that feature might satisfy some text search use cases.

triggan•13mo ago

Yes. Maybe a need for some better examples for Service Registry.

GilOP•13mo ago

I'm currently using TextP.Regex for my queries (as well as "startsWith", "containing" etc), but it absolutely kills the performance of the query, and this seems to be one of the most common reasons why people go and integrate it with something like ElasticSearch well this still leads to me having to integrate my DB with another tool, which is exactly what I was trying to avoid.

dmcmanus•13mo ago

I was thinking about this recently as well for attempting to implement a fuzzy search on a name... I haven't fully fleshed out exactly how it would work (or if it would work at all) but essentially, I wondered if it could be accomplished by creating a separate vertex for each letter in the legal_name with a CONTAINS_LETTER Edge (and maybe a positional property on the edge?) My thought was that given an input string (a name, in this example) you could use a repeat() step until() some pre-defined match criteria were met *Edit - I'm a relative Gremlin newbie, so forgive me if that makes zero sense!

spmallette•13mo ago

the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context. theoretically, i think you could model search the way you describe, but it does create a lot of extra infrastructure in your graph which might have performance/space/administrative implications.

Bo•13mo ago

the fun thing about graphs is that as soon as you start learning more about them, you start seeing how many problems can be put into a graph context.

Haha cannot agree more. Everything that sits in the old RDBMS space can be reinvented in graph universe.

spmallette•13mo ago

i've even infected my children. they see stuff randomly in the world and are like, "whoa, that's a graph"

ManabuBeach•12mo ago

A quite a bit of side but I have been working extensively on Google Firebase Firestore lately. The same thing. no case insensitive search what-so-ever. Their suggestion - buy an index service. Not wanting that I just parse out words, lower case them and store them in an array property. Latest gremlin supports RegEx perdicate thoough now, so that's a lot better today.

M. alhaddad•12mo ago

@Gil I used TextP longtime ago but then it started to be slower with increased data size. So i integrated elasticsearch Engine: Neptune AWS But today i am facing a new problem, it fails on retrieving something with hyphens

triggan•12mo ago

Is that failing on the OpenSearch side, or from the call from Neptune?

M. alhaddad•12mo ago

by fail i meant fail to retrieve results, it brings empty output.

Gaming

Programming

Fulltext-search-like features without ElasticSearch, OpenSearch, Solr and such?

Did you find this page helpful?