Lonnie VanZandt
Lonnie VanZandt
ATApache TinkerPop
Created by Lyndon on 5/7/2024 in #questions
Traversal Inspection for properties used
Lightly reading: does simply parsing the query statement itself help? Rather than examining all the content flowing through the consideration pipeline, why not study the query itself? Perhaps you seek statistics about, for actual content, which of the predicates proved to be the most discriminatory?
20 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/27/2024 in #questions
Gremlin Injection Attacks?
Yes, I saw your mention of the risk on the other thread. I'm looking at a cybersecurity questionnaire and was curious if the Gremlin community had any horror stories here.
7 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
This is precisely what I understand: The MathStep.java implementation even extends MapStep.java to implement what it does. The by() steps merely apply transformations to the Traverser argument to produce different "variables". What I ask is, how about making that generally available to MapStep itself? Let the author of the query decide whether or not that is business useful. It is true that the unpacking of the Traversal, the "it", can be done instead the lambda/closure - but sometimes it is more convenient to use the Gremlin language and method to do that, immediately before entering the Groovy/Java context.
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
the secondary arguments would all be the result of traversals which begin at the traversal arriving at the map-by-by step, where each secondary traversal is determined by the corresponding by clause. One could, of course, provide constant() or cap() values.
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
I agree with you on how it might look for Java. Because Gremlin's math step is very similar to what I intend, I ask how multi-variate mathematical steps with by clauses are done in Gremlin. Why is it not apparent how convenient it would be to perform an arbitrary transformation to each element of a sequence where the arguments of the applied function include not only the current element but one or more additional arguments which are related to the current element? I appreciate the Beauty of a language but I am not adverse to be practical and to use language features when they exist. Gremlin supports Groovy lambdas and there are times when they get the job done. Sometimes, getting an answer at all is better than not being able to get an answer because it wasn't optimized for performance. I would argue against removing lambda support from Gremlin, simply because performance stinks at times.
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
my intent is still 1-to-1 (many to 1 would be a reduce operation). In the example query above, it only yields 1 result - like a reduction - because the stream feeding into that map has been constructed to yield only 1 member. A 1:1 map with a multiple-argument closure is very useful - and is often easier to analyze than is a single-argument closure that happens to include external state via variable reference where the variables are conveniently available in the closure's context. The number of arguments would be explicit - just as they are for "math". The author must provide a by clause for each parameter - or it must be very intuitive how Gremlin would dynamically resolve "implicit" parameters from arbitrary queries. (Scala loves these nasty implicits.)
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
The idea is very similar to how "math()" is implemented. The difference is that "math" is opinionated about the transformation. It is going to take parameterized inputs and try to construct a valid mathematical expression and then compute the result. Map-By-By would allow the users to construct their own parameterized transformations. Without the anonymous applied lamba trick that I use (see above where I subtract and then divide given an incoming member instance).
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
BTW, does or could Tinkerpop map() come to accept by() clauses such that we could feed multiple arguments into the map lambda? For pseudo-example: something().map{ x,y,z -> x+y+z }.by( method-to-get-x ).by( method-to-get-y ).by( method-to-get-z )
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
Related, yes, Scala has a take(n) concept.
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
I'm sure that the projection and the reduction could be more Gremlin-idiomatic - but it is hard to diagnose why my "by" clauses are not tolerated by the engine. Getting the right class of object to those is not always intuitive. I can see that I resort to "aw heck, give me the collection as a Groovy/Java list and I can revert to that language to do what I want".
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
Stephen, I think this is not right. And it is my fault for how I proposed the setup. I want to compute the difference in time between each date in the set from its neighbor - and not compute the differences between pairs that form a set of pairs. For example, with A..B..C..D, I want 3 differences, not 2. That is, B-A, C-B, and D-C. Which means, btw, I was pretty dim. The average difference in a long partially ordered sequence like that is just (max - min)/num. Here's that query:
g.E().
has('Relation', 'container', 'MYVAL').
values('createdDate').
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
order().by(Order.asc).
fold().as( 'list' ).
project( 'oldest', 'newest', 'count' ).by( limit(local, 1) ).by( tail(local, 1) ).by( count(local) ).select( values ).
map{ it -> { t -> ( t[1] - t[0] ) /1000 /t[2] }( it.get() ) }
g.E().
has('Relation', 'container', 'MYVAL').
values('createdDate').
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
order().by(Order.asc).
fold().as( 'list' ).
project( 'oldest', 'newest', 'count' ).by( limit(local, 1) ).by( tail(local, 1) ).by( count(local) ).select( values ).
map{ it -> { t -> ( t[1] - t[0] ) /1000 /t[2] }( it.get() ) }
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
I'm intrigued. When this plays out in real time, do we actually see it resolve down to finding a set of dates and then picking from that one set, a pair at a time, to contribute to an accumulating sum? Or, no, we would see it making numerous copies of large (but shrinking) collections over and over only to shed most of the list to obtain the leading pair?
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
Ok, it's a functional beauty. // fetch the sequence of date values and sort them chronologically from earliest to most recent g.E().values('createdDate').asDate().order().by( Order.asc ) // wrap the sequence as a single collection .fold(). // generate a sequence of shortening collections, each collection will be two elements shorter than the prior list emit(). // each iteration, yield the contextual collection until(__.not(unfold())). // just keep going until we run out into the empty set repeat(skip(local,2)). // each iteration, make sure to chop off the head and the penultimate head // remove the last collection that is the empty set filter(unfold()). // now, for each collection in the sequence of collections // fetch the head and penultimate head, memoize the first as dts and step forward one limit(local,2).as('dts').limit(local,1). // given the current date entry, diff it from the prior one excluding the one after it dateDiff(select('dts').skip(local,1).unfold()). // we now have a sequence of deltas, compute their mean mean()
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
I do have this sense that "local" is like some magical token that can be dropped into more formal settings to solve challenges that would otherwise take substantially more method calls. I might have chosen "doIntendedVsDocumented" as the string for that keyword. I have to go study your query to really grok it.
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
Whether or not it works, I love the opportunity to study the pattern. To try to understand fold.emit.until(__.not(unfold()).repeat(skip(local,2) Also, I have to work with older Tinkerpop. The date methods would be nice.
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
If there is a way to have Gremlin traverse a collection and to operate on adjacent pairs (to get the delta between creations) without having to fall into Groovy, then I could stay in Gremlin and use mean() at the end of that process, directly. I think I could "slide" a sideEffect "cursor" as the traversal gets collected to perform the map() in-line - but it wasn't as pretty as Groovy collate.
26 replies
ATApache TinkerPop
Created by cdegroc on 1/17/2024 in #questions
LazyBarrierStrategy/NoOpBarrierStep incompatible with path-tracking
No! Say it ain't so: as conceptually beautiful as is Gremlin on the outside, it cannot be just like real code on the inside. I am crushed.
12 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 1/9/2024 in #questions
I met a man with seven wives, each of which had seven sacks.
This form, I think, is even better. Rather than decorate the collection during the grouping, the filter operation can also accumulate the cost. Ideally, there would be a filterUntil (a "takeWhile") step.
g.E().
has('Relation', 'container', {{thread_key}}).
has('modifiedDate', lte({{B_timestamp}})).
group().by('sKey').
select(values).
unfold().
filter( count(local).aggregate('costs').
cap('costs').sum( local ).is(lt(50)) )
g.E().
has('Relation', 'container', {{thread_key}}).
has('modifiedDate', lte({{B_timestamp}})).
group().by('sKey').
select(values).
unfold().
filter( count(local).aggregate('costs').
cap('costs').sum( local ).is(lt(50)) )
25 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 1/9/2024 in #questions
I met a man with seven wives, each of which had seven sacks.
The "aggregate" builds a queue of all per-key costs see so far and so the accumulated cost at the encounter of the current key is the sum of the queue of costs plus the current cost. Then, after the grouping is finished, we filter to select all entries whose accumulated cost is less than some budget amount. This is cleaner because there's no preamble to set up a sideEffect nor a sack and there's no need to pick the tail off the growing queue. We just re-sum the queue each key.
25 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 1/9/2024 in #questions
I met a man with seven wives, each of which had seven sacks.
Here's a more expressive variant:
g.E().has( 'Relation', 'container', {{thread_key}} ).
has('modifiedDate',lte( {{B_timestamp}} ) ).
group().
by( 'sKey' ).
by( fold().as( 'versions', 'accumulated' ).
select( 'versions', 'accumulated' ).
by( identity() ).
by( unfold().count(). // current cost, the number of Relations for the current sKey
aggregate( 'costs' ).cap( 'costs' ).unfold().sum() ) ). // the sum of all costs so far
select( values ).unfold().filter( select( 'accumulated' ).is( lte( 50 ) ) )
g.E().has( 'Relation', 'container', {{thread_key}} ).
has('modifiedDate',lte( {{B_timestamp}} ) ).
group().
by( 'sKey' ).
by( fold().as( 'versions', 'accumulated' ).
select( 'versions', 'accumulated' ).
by( identity() ).
by( unfold().count(). // current cost, the number of Relations for the current sKey
aggregate( 'costs' ).cap( 'costs' ).unfold().sum() ) ). // the sum of all costs so far
select( values ).unfold().filter( select( 'accumulated' ).is( lte( 50 ) ) )
25 replies