Gremlin query has strange behavior with range() and limit()

Hey everyone, I have a Neptune database and use gremlin to query it. I have user vertices that could be connected with edges like "friends", "follows", "blocks", "reports". I want to make a query that gives "suggestions" to the user by presenting him with users that the people he follows follow. These users shouldn't be followed, blocked, or reported by the the current user and also should be up to 3 suggestions per user he follows. I have constructed the following query : g .V(userId) .Out("follows") .As(ConnectingFollowerLabel) .Local<Vertex>(__ .Out("follows") .HasLabel("User") .Not(__.Both("blocks").HasId(userId)) .Not(__.In("reported").HasId(userId)) .Not(__.In("follows").HasId(userId)) .Not(__.HasId(userId)) .Limit<Vertex>(maxProfileSuggestionsPerCommonUser)) .Dedup(); As it is it returns the results as it should. But for users that follow many users it gets kind of slow, so I added a limit() on the users fetched and added pagination with range() like this g .V(userId) .Out("follows") .As(ConnectingFollowerLabel) .Local<Vertex>(__ .Out("follows") .HasLabel("User") .Not(__.Both("blocks").HasId(userId)) .Not(__.In("reported").HasId(userId)) .Not(__.In("follows").HasId(userId)) .Not(__.HasId(userId)) .Limit<Vertex>(maxProfileSuggestionsPerCommonUser)) .Limit<Vertex>(maxProfileSuggestions) .Dedup() .Range<Vertex>(Scope.Global, offset, offsetLimit); As soon as I add the limit and range step the results change and I get suggestions with people I already follow. I tried ordering the result before the range step but that doesn't seem to work either. Any help would be greatly appreciated.
10 Replies
spmallette
spmallette16mo ago
as a reference for folks answering, this question also appears on StackOverflow: https://stackoverflow.com/q/76808977/1831717 - there's also some sample data there if anyone is investigating this one currently here on discord
Stack Overflow
Gremlin query has strange behavior with range() and limit()
I have a Neptune database and use gremlin to query it. I have user vertices that could be connected with edges like "friends", "follows", "blocks", "reports"...
kelvinl2816
kelvinl281616mo ago
Just FYI we are investigating this and @bechbd was able to reproduce it. The discusion is ongoing over at StackOverflow
Dave
Dave16mo ago
@bulletlegend I was able to reproduce this only if I used range(global, 0, 1). Any other combination of offset and limit did not seem to reproduce this error. Is this what you were able to observe?
bulletlegend
bulletlegendOP16mo ago
Hello and thank you for the interest @bechbd . Yes this the behaviour I observed. I have found out the offsset and limit needed to reproduce this are varied base on the size of the graph and the total results.
Dave
Dave16mo ago
I have found out the offsset and limit needed to reproduce this are varied base on the size of the graph and the total results. - Can you provide a few examples here? I tried this on the sample graph you provided I was only able to reproduce it for that singular case
bulletlegend
bulletlegendOP16mo ago
The easiest way to reproduce this was on my production database where there are 300K. I will try to make the sample I gave bigger with more edges to hopefully make it easier reproducible
thanoskatiras
thanoskatiras16mo ago
@bechbd Thank you for your time on this. The query written in the SO thread has a wrong label "reported" instead of "report_citizen" (I hope you noticed already). As we are unable to reproduce it with a small sample graph, I am curious if you used the one provided in the SO thread.
redtree1112
redtree111215mo ago
Thanks for reporting. This can be reproduced with Modern graph with this simple query: g.V('3').local(not(__.in("created").hasId('6')).limit(1)).limit(1) I believe the chunk size that Neptune uses for optimization does something wrong. I am investigating the issue. Yeah this is indeed caused by chunkSize optimization in Neptune. However the fix wouldn't be straightforward, let me talk internally in Neptune team. As a workaround, can you try the query hint g.withSideEffect("Neptune#chunkSize", 1000) and see if the issue is resolved ? 1000 is the default chunkSize in Neptune, so it won't harm more than the normal behavior. 1000 may not be enough if a number of items returned is big. So you may need to set large enough number to test all solutions under not. I know this is not ideal, Neptune needs a better solution in a long term.
thanoskatiras
thanoskatiras15mo ago
@redtree1112 Thank you for your input. I ll get back to you as soon as we try this workaround. @redtree1112 This did not work. I am unable to find any information on this neptune hint ("Neptune#chunkSize") in the docs. Where can I learn more about this?
redtree1112
redtree111215mo ago
Hmm let me see. Unfortunately Neptune#chunkSize is publicly available but we did not publish the information because this query hint generally does no good and it is intended only for debugging. Can you run profile while adding chunkSize ? I want to see if the chunkSize is actually changed in the plan.
Want results from more Discord servers?
Add your server