Add Multiple addV() by one Iterate
Hello guys, I have crazy question which needs some experts to help me. I am using C# to add many nodes (20k)
I am adding them to the Aws Neptune. but if I do them one by one it's going to take very long time. therefore I need to have like bulk addV() codes. here is my code but it is not working as I want.
- Some concerns here: what is the maximum Iterate() requests?
- Can I add the 20k node by one go or I need to devided them in smaller packets?
- Is there any better way to Bulk add/update the graph?
7 Replies
You should not send them all at once. You should batch your requests. The size of the batches could be dependent on the complexity of your load, so you might need to experiment with what works best as a size. If you do not batch, you will likely end in one of several errors: (1) a timeout if it takes too long, (2) a query construction problem if your query hits JVM stack limits or (3) a memory issue should the transaction grow beyond allowable limits. regarding (1) and timeouts, it's also worth noting that long queries will hold locks on the data so if you're doing anything in parallel to this load you could run into transaction failures. For Neptune those can also occur on reads. Your example seems pretty simple, so perhaps a hundred at a time might be good place to start.
Thanks alot for the answer.
Should that traversal variable not be just
g
?by convention,
g
is the GraphTraversalSource
. you get that from anonymously calling: traversal()
. when you do g.V()
it produces a GraphTraversal
object so if you had a variable to refer to that and you already used the g
convention, you wouldn't assign that value to g
. Typically, that variable is traversal
or just t
if the variable is defined at all. You usually don't see it because you just write your Gremlin from g
all the way to termination (e.g. iterate()
) but sometimes, like in this example, you're building a GraphTraversal
in some dynamic way or just passing it around to different functions for some reason in which case a variable is needed.So, in go I have been using: t := g.Traversal(). Then adding to t in loops for say batches of 20. Rather than t := g.V() as in the above python code. I preseume then that either is OK?
i'm not sure. what does the "g" in
g.Traversal()
refer to in your case? the documentation for go has it as:
in that case g
is a GraphTraversalSource
and it is from that (by convention) you would do:
Yes - that's what I meant. I get the traversal source and then use that