Apache TinkerPop•5mo ago

How to Work with Transactions with Gremlin Python

I`m trying to implement transactions but I have two scenarios. I start a transaction but when I use iterate on every add_v it saves on my gremlin_server before the commit. The second situation is if if take out the .iterate() and run a commit() it doenst save on gremlin-server. What am I doing wrong?

Solution:

If you're looking to optimize for write throughput on Neptune, you want to consider the following: - For each write requests, attempt to batch 100-200 "object" into a single write request/query. An "object" would be any combination of a vertex, edge, or subsequent vertex/edge properties (vertex with 4 properties == 5 "objects"). - Use parallel write requests. If using Python, consider using multiprocessing to create separate processes. They can share a connection pool to Neptune if you so choose. The number of parallel processes should equal the number of query execution threads available on your Neptune writer instance (which is equal to 2x the number of vCPUs on whatever size instance you're using). If you follow those guidelines, you should get similar performance to what you would see with Neptune's bulk loader. Note that conditional writes will have overhead. If using mergeV(), you're unlikely to see the same write throughput as Neptune's bulk loader as the bulk loader is not doing conditional writes....

Jump to solution

7 Replies

Yang Xia•5mo ago

Hi, for remote gremlin queries (using transaction or not), terminal steps have to be added in order for the traversal to be submitted to the server. So yes, iterate() will have to be included into each line, or else the driver is only constructing the queries but not sending them to the server. Another one that can be used is next(). For more information and a list of all terminal steps, see docs here.

AlexOP•5mo ago

Thanks for the response @Yang Xia, sorry for my lack of knowledge but what is the purpose of opening a transaction and commit its transaction if the iterate() send to the server already? Also the .rollback() doenst work if I try to use it after the iterate() on gremlin-server. I received this suggestion to use tranctions to try to have more performance than using query string, thats why I`m try to implement it and check the difference in performance.

Yang Xia•5mo ago

Have to say I'm not the expert on transactions, but all transactional sessions are handled by the server side and not client side, so all traversals have to be submitted. To discard your changes, are you calling tx.rollback() before or after tx.commit()? The definition of rollback is that the server will roll back all the changes since the last commit, so what happens is when you iterate a traversal, it will be sent to the server and be processed. If you decide to keep the changes, you call tx.commit(), if you want to discard the changes you call tx.rollback() , which will discard up to the last tx.commit() call. Also to confirm, I see NEPTUNE inside your debug, are you using a local Gremlin Server or a Neptune database?

triggan•5mo ago

I received this suggestion to use tranctions to try to have more performance than using query string, thats why I`m try to implement it and check the difference in performance.

Unsure where this is coming from. What sort of performance gain are you looking for?
If you're using Gremlin Server, what backing store are you using? TinkerGraph? If so, ensure you're using TinkerTransactionGraph.
There's more on how to use TinkerTransactionGraph for unit testing of transactions here: https://aws.amazon.com/blogs/database/unit-testing-apache-tinkerpop-transactions-from-tinkergraph-to-amazon-neptune/

Amazon Web Services

Unit testing Apache TinkerPop transactions: From TinkerGraph to Ama...

In this post, I build upon the approach of the previous post and show how you can use TinkerGraph to unit test your transactional workloads. Additionally, I show how to use TinkerGraph in embedded mode. Embedded mode requires the use of Java, but it simplifies the test environment considerably as there is no need to run the server as a separate ...

AlexOP•5mo ago

I tried to call tx.rollback() after I did the mergeV inserts just to check if it works. It doenst rolled back the data inserted. But I discarded the option of using transactions because the performance was similar than using the chained mergeV() command. I`m spending 150ms to insert some vertex and edges so I was looking for a better way to do that instead of using the method submit from client.Client with chained mergeV and mergeE (two requests to not have conflict between edges) But I didnt see any difference using transactions. Those tests were made on AWS Neptune with an api made in python with fastapi freamework. Local is difficult to test the performance because the response time is really different from the server.

Solution

triggan•5mo ago

You may see write throughput exceed 120,000 in some cases. There are a number of dependencies that drive that. But that's the safe number to use when estimating load speed/rates.

Gaming

Programming

How to Work with Transactions with Gremlin Python

Did you find this page helpful?