Coalesce steps causing concurrency issues
I’ve been having some issues with concurrnt modification errors on AWS Neptune. I’ve tried lowering the reserve concurrency but, the issue persists. Based on https://stackoverflow.com/questions/69932798/concurrentmodificationexception-in-amazon-neptune-using-gremlin-javascript-langu I was wondering if I could split up my use-case into two queries the first a g.V().has({my unique property})… and if it’s null only then do I do the mutation step. I’m kind of confused why this would make a difference though.
This is my use-case (gremlinStatistics is how I imported __ for anonymous traversals, I’ll be changing that haha)
collection = await gremlinConnection
.V()
.hasLabel('collection')
.has('id', conceptId)
.fold()
.coalesce(
gremlinStatistics.unfold(),
addVCommand
)
.next()
https://github.com/nasa/Common-Metadata-Repository/blob/master/graph-db/serverless/src/utils/cmr/indexCmrCollection.js
Stack Overflow
ConcurrentModificationException in amazon neptune using gremlin jav...
I am trying to check and insert 1000 vertices in chunk using promise.all(). The code is as follows:
public async createManyByKey(label: string, key: string, properties: object[]): Promise<T[]>...
GitHub
Common-Metadata-Repository/indexCmrCollection.js at master · nasa/C...
Contribute to nasa/Common-Metadata-Repository development by creating an account on GitHub.
Solution:Jump to solution
In general the CME is a retryable exception, and in a highly concurrent environment where multiple client threads are mutating the graph at more or less the same time they are likely to happen and should be expected / coded for. That said, as much as possible having each client thread be touching different parts of the graph can reduce the likelihood of lock contentions.
6 Replies
I should mention that this function is being executed by a lambda reading from an SQS queue event that contains the information for what to index into the graph
Also I have the latest version of the Neptune engine (upgraded in the last few weeks)
Do you have a high number of CMEs? Perhaps you've already tried this but if it's low then maybe a retry system would help when you encounter a CME.
I’m still kind of working this out yea I did have a large amount of CME’s and one solution has been a combination of throttling the lambda and adding a retry policy that did ensure that all my nodes at least made it into the graph. I was more so trying to see if there was a source code fix I could make. In this use case throttling isn’t a big deal though (it’s looking like I’m going to have to due to a slow response downstream anyway)
@KelvinL an answer you provided on SO was referenced here....do you happen to have any thoughts on this issue?
Solution
In general the CME is a retryable exception, and in a highly concurrent environment where multiple client threads are mutating the graph at more or less the same time they are likely to happen and should be expected / coded for. That said, as much as possible having each client thread be touching different parts of the graph can reduce the likelihood of lock contentions.
Thank you two for the reply I really appreciate it, the gremlin and graph community is excellent. There are a few ways I think I could optimize that but, I don’t think I’m our use-case we could really group the worker threads that way. Basically we are copying data from our API into our graphDB. It’s an SQS -> lambda -> Neptune workflow. I only get the issues when I am bootstrapping, in this case running a different lambda to index most of the data from our main API holdings into graphDb as opposed to our usual use-case which is that someone sends a request to the API and it ends up in our indexing queue, typically at a rate far far slower