J
JanusGraph2mo ago
NeO

custom vertex id (String) feature to avoid duplicate vertex

I have a graph(Backed with Cassandra and read/write consistency QUORUM) which will have Vertex property "recordId" and "type". I have disable consistency locking on property say "recordId" in my graph, I see duplicate vertex getting created for same "recordId", due to concurrent writes. Note - In above case we were not providing any custom vertex id, but relying in Janusgraph to generate vertex id. Now, I'm using custom vertex id (String) feature to maintain consistency of vertex in my graph, where I say recordId=custom vertex id. Questions - If I use "recordId" as custom vertex id, I'm hoping that there will be no duplicate vertex created in graph with onsistency locking on property say "recordId". Is this fare understanding ?
6 Replies
NeO
NeOOP2mo ago
@Boxuan Li @Oleksandr Porunov your thoughts on this ?
criminosis
criminosis2mo ago
Fwiw I asked a similar question a few months ago. It sounded like at least the vertex shouldn't be duplicated based on the answer I got (https://discord.com/channels/981533699378135051/1204932121408442478/1205063257430302730), but not sure if that attends to vertex properties.
porunov
porunov2mo ago
As for vertex itself, yes, by using the same custom vertex id you will always map to the same partition. Thus, it's impossible to create 2 duplicate vertices in such case. However, it doesn't mean it won't be possible to create duplicate vertex properties. In fact, there is a chance you will create some duplicate vertex properties in your partition. Meaning, if you have 2 processes which are writing a vertex by id "myId123" with a property LIST like "myProperty=foo,bar" then there is a chance you will have "myProperty=foo,bar,foo,bar" as you pushed 4 properties with different ids to your partition. In other words, it's not safe to rely on the internal vertex id to overcome potential concurrency issues. If you need to ensure uniquencess, I would suggest relying on locking mechanisms (either pessimistic or optimistic locking could work in your case). I would use pessimistic locking if it's critical not to have a duplicate at any moment in time. Otherwise, if it's OK to have eventual correct state then I would use optimistic locking.
NeO
NeOOP2mo ago
In our case, we have optimistic locking enabled on recordId, for instance - JanusGraphManagement mgmt = graph.openManagement(); mgmt.makePropertyKey("recordId").dataType(String.class).make() : mgmt.getPropertyKey("recordId"); JanusGraphIndex recordIdIndex = mgmt.buildIndex("recordIdIndex", Vertex.class).addKey(recordId).unique().buildCompositeIndex(); mgmt.setConsistency(recordId, ConsistencyModifier.LOCK); mgmt.setConsistency(recordIdIndex, ConsistencyModifier.LOCK); But due to this we are facing a ton of PermanentLockingException(Expected value mismatch for X: expected=Y vs actual=Z). We have retry(5) transaction in place with exponential back off(5ec), but still we are face this issue a lot. What else can be done here ? In our case, currently we do not have property LIST used, and only have property string. for instance - property(Constants.RECORD_ID, recordId) and property(Constants.TYPE, type). So above issue that you explained is only for LIST ?
porunov
porunov2mo ago
The issue I explained above is related to anything where the generated column key is unique per element. Meaning: Any edge and any property with cardinality LIST or SET. Properties with cardinality SINGLE will be replaced as they have key equal to PropertyKey id.
NeO
NeOOP2mo ago
Thanks for quick response @porunov here regarding LIST. Can you suggest any workaround for following issue mentioned -
But due to this we are facing a ton of PermanentLockingException(Expected value mismatch for X: expected=Y vs actual=Z). We have retry(5) transaction in place with exponential back off(5ec), but still we are face this issue a lot. What else can be done here ?
Want results from more Discord servers?
Add your server