store edges of a node in a sorted manner
Is there any way for me to store the edges of a specific node in a sorted manner?
Is there any Gremlin based database that implements indices perhaps?
Solution:Jump to solution
Is there any way for me to store the edges of a specific node in a sorted manner?to my knowledge none of the common ones will allow that. i dont think most even preserve insertion order for edges.
Is there any Gremlin based database that implements indices perhaps?i assume you are still referring to edges here. Titan-like graphs such as JanusGraph, HugeGraph, etc. and DataStax Graph allow you to explicitly define indices on edges...
16 Replies
Solution
Is there any way for me to store the edges of a specific node in a sorted manner?to my knowledge none of the common ones will allow that. i dont think most even preserve insertion order for edges.
Is there any Gremlin based database that implements indices perhaps?i assume you are still referring to edges here. Titan-like graphs such as JanusGraph, HugeGraph, etc. and DataStax Graph allow you to explicitly define indices on edges
are you familiar with anything that will allow me to do that?
so I'm assuming also i wouldn't be able to perform binary search on a sorted index correct?
all i think you could do is add a property to your edge that indicates the order your want (e.g. some numeric value) and then use that to
order()
your edges. whether that ordering is done in-memory or in some optimized manner will be entirely up to the graph you are using.@kyano what would it be sorted by? Like time it was created? Or do you mean to analyze which edges are most commonly traversed or something like that?
it will be sorted by a numeric property shared by all
In general I think it is best to not depend on specific DB behavior for edge retrieval order, and to always order explicitly as needed. How many edges will you typically be needing to sort/order by a property? I suppose it might be possible to also use the edge ID for this, IFF the DB allows user defined edge IDs (not all do).
I'll give you an example and maybe you could suggest the best solution (if there is one):
I have an a node
V
which holds the property value: x
.
All of the incoming edges hold desiredValue: y
for some y
.
I define a certain operation for all nodes V1
with incident edges with V
that have desiredValue < value
, and for all node V2
with incident edges with V
that have desiredValue > value
.
Therefore in this instance it would be very efficient to keep all of those edges sorted, because with a single binary search of value
I can identify which nodes need what operation.
@KelvinL @edolivares @spmallette tagging for visibilitySince there aren't any graphs (that i'm aware of) that can store edges in some specified order, I think you would have to use
order()
step (whether by id
as kelvin suggested or by property) and rely on the underlying graph to do its best to return that sort in whatever way it optimizes to do that.i see
but if i have that property indexed, is it safe to assume that the order step will work faster than a naive order?
whether adding
order()
outperforms a naïve order will be dependent on your graph database. you will need to profile your queries to determine how the query is being optimized. that said, an "index" will typically help with filtering actions and not ordering actions. for example, a vertex centric index in JanusGraph will improve your ability to find certain edges with has()
but won't do anything to help with order()
. in other words doing outE().order().by('value').has('value', gt(desiredValue))
doesn't offer any improvement to the has()
. i'd again call this general advice to consider.
i think @KelvinL had a good question that i don't see answered:
How many edges will you typically be needing to sort/order by a property?
variable, it could be thousands it could be a few
hmmm i see. so essentially, I would have improvement in the
has
step correct? Meaning, that by using a graph like janus graph, filtering with has('value', gt(desiredValue))
would be more efficient since value
is indexed as opposed to if it wasn't?yes. in your case though, i wonder if you need to worry too much about this issue at all yet. i don't know the full context of your query or data but a few thousand edges should be pretty fast to filter through in-memory.
correct but this is per node. I expect to have a couple 10s if not 100s of thousands of node, so it can get quite large
i see. what kind of performance are you seeing now? i think i remember that you're using Neptune correct?
Actually, this is for a different project. Haven't reached development yet, just doing research atm for the architecture. I can share it with you privately if you're interested.
ok, feel free to share with me privately if you like. have i answered your question here at this point? or is there still something that isn't clear?