custom vertex id (String) feature to avoid duplicate vertex
I have a graph(Backed with Cassandra and read/write consistency QUORUM) which will have Vertex property "recordId" and "type".
I have disable consistency locking on property say "recordId" in my graph, I see duplicate vertex getting created for same "recordId", due to concurrent writes.
Note - In above case we were not providing any custom vertex id, but relying in Janusgraph to generate vertex id.
Now, I'm using custom vertex id (String) feature to maintain consistency of vertex in my graph, where I say recordId=custom vertex id.
Questions -
If I use "recordId" as custom vertex id, I'm hoping that there will be no duplicate vertex created in graph with onsistency locking on property say "recordId". Is this fare understanding ?
6 Replies
@Boxuan Li @Oleksandr Porunov your thoughts on this ?
Fwiw I asked a similar question a few months ago. It sounded like at least the vertex shouldn't be duplicated based on the answer I got (https://discord.com/channels/981533699378135051/1204932121408442478/1205063257430302730), but not sure if that attends to vertex properties.
As for vertex itself, yes, by using the same custom vertex id you will always map to the same partition. Thus, it's impossible to create 2 duplicate vertices in such case. However, it doesn't mean it won't be possible to create duplicate vertex properties. In fact, there is a chance you will create some duplicate vertex properties in your partition. Meaning, if you have 2 processes which are writing a vertex by id "myId123" with a property LIST like "myProperty=foo,bar" then there is a chance you will have "myProperty=foo,bar,foo,bar" as you pushed 4 properties with different ids to your partition. In other words, it's not safe to rely on the internal vertex id to overcome potential concurrency issues. If you need to ensure uniquencess, I would suggest relying on locking mechanisms (either pessimistic or optimistic locking could work in your case). I would use pessimistic locking if it's critical not to have a duplicate at any moment in time. Otherwise, if it's OK to have eventual correct state then I would use optimistic locking.
In our case, we have optimistic locking enabled on recordId, for instance -
JanusGraphManagement mgmt = graph.openManagement();
mgmt.makePropertyKey("recordId").dataType(String.class).make() : mgmt.getPropertyKey("recordId");
JanusGraphIndex recordIdIndex = mgmt.buildIndex("recordIdIndex", Vertex.class).addKey(recordId).unique().buildCompositeIndex();
mgmt.setConsistency(recordId, ConsistencyModifier.LOCK);
mgmt.setConsistency(recordIdIndex, ConsistencyModifier.LOCK);
But due to this we are facing a ton of PermanentLockingException(Expected value mismatch for X: expected=Y vs actual=Z).
We have retry(5) transaction in place with exponential back off(5ec), but still we are face this issue a lot.
What else can be done here ?
In our case, currently we do not have property LIST used, and only have property string. for instance - property(Constants.RECORD_ID, recordId) and property(Constants.TYPE, type).
So above issue that you explained is only for LIST ?
The issue I explained above is related to anything where the generated column key is unique per element. Meaning: Any edge and any property with cardinality LIST or SET. Properties with cardinality SINGLE will be replaced as they have key equal to PropertyKey id.
Thanks for quick response @porunov here regarding LIST.
Can you suggest any workaround for following issue mentioned -
But due to this we are facing a ton of PermanentLockingException(Expected value mismatch for X: expected=Y vs actual=Z). We have retry(5) transaction in place with exponential back off(5ec), but still we are face this issue a lot. What else can be done here ?