[Bug?] gremlinpython is hanged up or not recovering connection after connection error has occurred
Hello, TinkerPop team.
I am struggling to avoid problems after a connection error occur.
And now, I suspect it might be led by something bug of gremlinpython...
Are these bugs? Or just I use it wrongly?
Please let me know.
Best Regards,
environments
- wsl2 on Windows11 (Ubuntu)
- Python 3.12.4
- gremlinpython 3.7.2
- TinkerPop server: JanusGraph 1.0.0
JanusGraph is launched by docker compose:
Case 1: Script is hanged up when all pooled connections are consumed?
When I specify wrong url to simulate network error,
gremlinpython might consume connections and do not return them into the pool.
So, below script is hanged up after all pooled connections are consumed.
Python Script: see
case1.py
The Output: see case1-output.txt
The result is changed when I specify different value to pool_size
argument.
My expectation is that error messages are shown in 9 times and the script ends.
Case 2: Manual transaction is never rolled back(closed)
Same as case 1, manual transaction is never ended.
So, I cannot recover the error.
Python Script: see case2.py
The Output: see case2-output.py
My expectation is that this script is end after trying 9 times and all trials are failed.
Case 3: Once a connection error occurred, pooled connections are broken
After I stopped TinkerPop server(JanusGraph) temporary,
some pooled connections are broken and will not be recovered.
Python Script: see case3.py
The Output: see case3-output.txt
My expectation is that connections are refreshed if they are not available when get them from the pool.Solution:Jump to solution
What you're noticing here kind of boils down to how connection pooling works in gremlin-python. The pool is really just a queue that the connection adds itself back to after either an error or a success but it's missing some handling for the scenarios you pointed out. One of the main issues is that the pool itself can't determine if a connection is healthy or if it unhealthy and should be removed from the pool.
I think you should go ahead and make a Jira for this. If it's easier for you, I can help you make one that references this post. I think the only workaround right now is to occasionally open a new Client to create a new pool of connections when you notice some of those exceptions....
6 Replies
sorry it's taking a bit to reply here, but i think these cases might need a bit of investigation to get some answers. cc/ @Yang Xia
Yes, we'll take a look this week, thanks for the details!
Solution
What you're noticing here kind of boils down to how connection pooling works in gremlin-python. The pool is really just a queue that the connection adds itself back to after either an error or a success but it's missing some handling for the scenarios you pointed out. One of the main issues is that the pool itself can't determine if a connection is healthy or if it unhealthy and should be removed from the pool.
I think you should go ahead and make a Jira for this. If it's easier for you, I can help you make one that references this post. I think the only workaround right now is to occasionally open a new Client to create a new pool of connections when you notice some of those exceptions.
Thank you for replying
I think you should go ahead and make a Jira for this. If it's easier for you, I can help you make one that references this post.I am sorry, but I do not know the 'Jira' that you mentioned and how to create it. So, I would appreciate it if you help me making it or you make it on my behalf...
I think the only workaround right now is to occasionally open a new Client to create a new pool of connections when you notice some of those exceptions.Yes, I have already apply such a workaround. The main reason I posted this thread is just confirming what is the expected behavior or whether my usage is wrong or not.
Jira ticket is: https://issues.apache.org/jira/browse/TINKERPOP-3114
Thank you so much for creating the ticket, @Kennh
I appreciate your cooperation, it is very helpful🙇