AWS Neptune: Pong fails and close event not emitted

Hey guys, long time no see. We have an issue which occurred a few times in the last couple weeks and we've been investigating for a while; posting here in case the issue is maybe known. We are using the gremlin-aws-sigv4 in a NodeJS project. We occasionally do a ping to the server and wait for a Pong with a timeout of 3 seconds. If a pong times out, we decide that the connection is not doing well and close it. We do it by accessing the websocket and calling terminate() on it. Along the process, this begins the close() function to close the connection to the server, which then waits for the close event. However, it seems like the event is sometimes never actually emitted, which leaves the connection open for a long time. Either the connection is not being closed properly, or the connection is not emitting the event for some reason, leaving the connection open and is in memory forever. This causes a memory leak over time. Any ideas what we could do to handle those issues?
Solution:
Interesting approach. Our typical guidance is to not worry about whether or not the connection is live and assume it is always available. Then build in exception handling and reconnect logic for the condition when a query is sent to a closed connection. Neptune will close connections on the server side if they are idle for more than 20-25 minutes.
Jump to solution
3 Replies
Solution
triggan
triggan14mo ago
Interesting approach. Our typical guidance is to not worry about whether or not the connection is live and assume it is always available. Then build in exception handling and reconnect logic for the condition when a query is sent to a closed connection. Neptune will close connections on the server side if they are idle for more than 20-25 minutes.
Shush
ShushOP14mo ago
Hi, thanks for replying, so we eventually found a way to fix this, it has to do with how the library for neptune is set up, it has a ping interval of 1 second and a pong timeout of 2, and it seems to not close it properly when the pong times out. So we hacked the library and added code that terminates the websocket and then the client from within, and increased the ping interval to 6 and pong timeout to 3 (so the ping and pong intervals are always correctly pointed towards). Fixing those fixed the issue for good.
triggan
triggan14mo ago
Which "library for neptune" are you referring to?

Did you find this page helpful?