AT
Apache TinkerPop•17mo ago
Sevi

inverted regex search

Hey, In my vertices I store escaped regexp statements as labels (e.g: 'wh.' which in theory should match the string "why"). I have an input string such as for example 'why'. This input parameter string string should be matched against the regexps stored in the vertices and the matching vertices should be returned. What would be the way to do that? Is there a more optimal query to that than let's say ''' g.label().filter(regex(label()).matches('why')) ''''
Solution:
Sorry, I didn't see this question for some reason. I can't think of a way to do what you want to do. regex is a P which is a form of predicate and P cannot take dynamic values. The only way you could do it is to use a lambda/closure as you already tried to do: ``` gremlin> import java.util.regex.Pattern ==>java.util.regex.Pattern gremlin> g = TinkerGraph.open().traversal()...
Jump to solution
9 Replies
ManabuBeach
ManabuBeach•17mo ago
I think it is best to use property to match and apparently there is not much performance penalty using properties.
Sevi
SeviOP•17mo ago
@ManaBububu could you give me an example how you would do this? I couldn't figure it out so far Tried to do something like this, but I keep failing: g.V().filter{constant('why').regex(it.get().label())}
ManabuBeach
ManabuBeach•17mo ago
Sevi I am out most of today but I will get back to you tomorrow but hope others can step in. Perhaps you can describe to us the problem that you want to solve to the extent you can tell us. In general the Label is similar to table name in SQL or collection name in Mongo. As such those are not usually part of query fields.
Sevi
SeviOP•17mo ago
@ManaBububu Okay, thanks 🙂 To me it does not matter if the regexp expression is stored as label on a vertex or a property both may be fine. About the problem let's say we have three vertices without any relationships for simplicity with label or property (does not matter which one): 1) 'why.' 2) 'w.y?' 3) 'wty?' I would like to execute a query on all of those three vertices which take these labels/properties, but take them as regexp expressions not just mere strings and match those against a fixed input string such as e.g: 'why?' Whichever vertices match this way, should be returned (the entire vertex) So in this graph of 3 vertices the regexps 'why.' , 'w.y?' and 'wty?' should be matched against the text 'why?' out of which the vertices of the first two should come out as matching and be returned. If however let's say the input string is 'somethingElse', it should return 0 results as neither of the three vertices taken as regexp can match the text 'somethingElse'. I don't know if my problem is understandable enough. So far I could not come up with a solution trying to browse through the docs of gremlin and the reference. Probably I have overlooked something, or lack some sort of understanding. Thanks for the help in advance.
ManabuBeach
ManabuBeach•16mo ago
@Sevi, basically you want to store the regex as data and definitely that's sort of reverse of the way we all use regex. What actually I have never constructed a Gremlin is to use a property value and feed that into a Gremlin query in the predicate argument part. Very interesting use case. I am guessing but you are trying to solve a knowledge graph type use case. So I am still hoping others can step in before I can figure that out. @spmallette I think the question here really is "Can I substitute the argument of a Predicate with a traversal - much like writing a subquery? Or should we have to rely on creating a temporary value via Groovy first then use that as an argument? Can you help? Here is one example from the Practical Gremlin Book 4.4.1. Using a variable to feed a traversal Sometimes it is very useful to store the result of a query in a variable and then, later on, use that variable to start a new traversal. You may have noticed we did that in the very last example of the prior section where we fed the german variable back in to a traversal. By way of another simple example, the code below stores the result of the first query in the variable austin and then uses it to look for routes from Austin in second query. Notice how we do this by passing the variable containing the Austin vertex into the V() step. austin=g.V().has('code','AUS').next() g.V(austin).out() So in your case as for storing a string as a property it goes like this: g.addV("regex_collection").property("rx", "why.*) Hope you can take it from here.
Solution
spmallette
spmallette•16mo ago
Sorry, I didn't see this question for some reason. I can't think of a way to do what you want to do. regex is a P which is a form of predicate and P cannot take dynamic values. The only way you could do it is to use a lambda/closure as you already tried to do:
gremlin> import java.util.regex.Pattern
==>java.util.regex.Pattern
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property('r','w.*').
......1> addV().property('r','x.*').iterate()
gremlin> g.V().filter{ Pattern.matches(it.get().value('r'), 'world') }.values('r')
==>w.*
gremlin> g.V().filter{ Pattern.matches(it.get().value('r'), 'x-ray') }.values('r')
==>x.*
gremlin> import java.util.regex.Pattern
==>java.util.regex.Pattern
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> g.addV().property('r','w.*').
......1> addV().property('r','x.*').iterate()
gremlin> g.V().filter{ Pattern.matches(it.get().value('r'), 'world') }.values('r')
==>w.*
gremlin> g.V().filter{ Pattern.matches(it.get().value('r'), 'x-ray') }.values('r')
==>x.*
Of course, you can't always use lambdas so this approach might not work for you in some cases.
ManabuBeach
ManabuBeach•16mo ago
As for "P" cannot take a dynamic value, would that be a candidate for future improvements in Gremlin? I think it is a fairly common use case to punch in a result of a "subquery"
spmallette
spmallette•16mo ago
yes, we'd like to have it. it's partially implemented but it's non-trivial for a variety of reasons.
ManabuBeach
ManabuBeach•16mo ago
I do not particularly feel these so important or urgent. While it may not be non-efficient, I just resort to more practical host side coding then issue different traversals. For example, I have a patient list but I need to summarize the earliest visit and the latest visit dates plus total number of visits for each patient. This class of problems are not easy for any database, theoretically sub-traversal aggregations can solve this but I am still struggling with how I can do this elegantly and efficiently in pure Gremlin.

Did you find this page helpful?