Lonnie VanZandt
Lonnie VanZandt
ATApache TinkerPop
Created by Lonnie VanZandt on 2/27/2024 in #questions
Gremlin Injection Attacks?
Is anyone talking about or looking into attacks and mitigations for Gremlin Injection Attacks? That is, just like all the commentary on how to design your PHP-based web frontend with Postgres backend to not be a sucker for an easy SQL Injection Attack, is anyone looking at how to handle your users of your Gremlin Server when those users give you Groovy lambdas that are rich in aggressive behavior?
7 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 2/14/2024 in #questions
SideEffect a variable, Use it later after BarrierStep?
I seek a query that builds a list and then needs to both sum the list's mapped values and divide the resulting sum by the count of the original list. This would be the mean() step - if the mapped list was still a Gremlin traversal object that offered mean(). However, the mapped list is, by that time, a Groovy list and mean() is no longer available. A groovy list can be averaged by reducing it to it sum and then dividing by the count of the list - but to get the count of the list, a separate reference to the list is needed or the count has to have been cached before. Therefore, is there a way to begin a query, sideEffect a value into a Groovy variable, complete the query, pass through a barrier step, and then divide the saved count?
g.E().
has('LABEL', 'PROP', 'MYPROPVAL').
values('createdDate').
order().by(Order.asc).
// would like to sideEffect cache the count of this set here
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
toList().
collate(2, true).
collect{ it -> ( it[1]-it[0] ) / 1000 }.
inject( 0 ) { accum,it -> accum + it }
// would like to divide the BigDecimal here by half the count
g.E().
has('LABEL', 'PROP', 'MYPROPVAL').
values('createdDate').
order().by(Order.asc).
// would like to sideEffect cache the count of this set here
map{ ( LocalDateTime.parse( it.get(),
java.time.format.DateTimeFormatter.ISO_LOCAL_DATE_TIME ).atZone( ZoneId.systemDefault() ).toInstant().toEpochMilli() ) }.
toList().
collate(2, true).
collect{ it -> ( it[1]-it[0] ) / 1000 }.
inject( 0 ) { accum,it -> accum + it }
// would like to divide the BigDecimal here by half the count
26 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 1/9/2024 in #questions
I met a man with seven wives, each of which had seven sacks.
I met a man with seven wives, each of which had seven sacks. Now, suppose I have shipping container that can hold up to 500 items and I need to inform a number of men that they and their families can board my ship because I know that all the items in their families' sacks will fit it the container. There may be a few empty spaces, but I can't tell a man that he and his family can board if any of their items would overflow the container. How do I construct a query which selects men as long as all the items of in the 7 sacks of their 7 wives will fit. Here's the challenge: I don't know how many items are in each sack until the family is considered. I ask the men and their families to line up and then I board men until the container is nearly full or exactly full and where any of the items of the next family would assuredly not fit. Let's say we have Vertices for Item, Sack, Wife, and Man and Relations marriedTo, hasSack, and hasItem. Let's say we rank order Man by Lastname. Breaking the St Ives rhyme, let's say that the number of wives per Man is variable as well as the number of Sacks per Wife and Items per Sack - but I want the to select Men from the queue until the count of Items from M+ -> W+ -> S+ -> I+ would exceed 500. Got an idea?
25 replies
ATApache TinkerPop
Created by Lonnie VanZandt on 1/7/2024 in #questions
May I suggest a new topic-channel for us? Like "really-big-data" or "pagination"?
Related to https://discord.com/channels/838910279550238720/1100527694342520963/1100853192922759244 and having read the recommended links on how to paginate the end of a query, I am wondering about how to manage large sets of traversals and large side-effect-collected collections which a query might be encountering or constructing as the graph is visited when the paths offer relatively large datasets after having been wisely filtered. For example, what is advised if one really does need to group by first-name all the followers of Taylor Swift (i.e. some exemplary uber-set) and wants to bag that for a later phase of a query which isn't the final collection that will be consumed by some external REST client? Yes, the final collect step can be easily paginated as advised - but what about all that earlier processing? What should we be thinking when we anticipate having 500,000, or 10x this, traversals heading into a group by - by - bye! barrier / collecting stage? Other than, "Punt" or "Run away!"?
8 replies