RuS2m
ATApache TinkerPop
•Created by RuS2m on 6/1/2024 in #questions
Analyzing samples of Gremlin Queries in Neptune Notebook
Hey everyone,
I’m working on a project where we give internal customers access to our Neptune graph through Neptune Notebook. There are already quite a few users, and we want to analyze the queries they run to see which parts of our ontology are used more and which parts are less utilized. This is not as straight-forward as retrieving all labels from the query, since our edge labels are not unique, and if people would be using
.in
or .out
steps without clarifying the entity name, it's almost impossible to analyze which part of ontology was visited. We also want to identify common query patterns to understand what people are usually querying for and which connections in our ontology are the most frequently used, but also filtering out some common to all queries parts, like g.V()
or g.V()
, retrieving rather information about combinations of multiple steps that were called.
We’ve figured out how to override the Gremlin magic in Neptune Notebook to add our custom logic to handle each query. And for my problem I’m considering two approaches:
- Running the Gremlin profiler on each query to get detailed info on the nodes visited and then applying custom language analysis algorithms.
- Collecting this data and feeding it into an LLM to summarize the queries and answer questions I'm interested in.
Has anyone here done something similar or have any insights on this? Would love to hear about your experiences or any advice you might have!9 replies