Gremlin python trying to connect Neptune WS when is down
Im trying to capture and control the exception when the database is down with gremlin but instead the queries or writers wait to max time limit, in this case lambda timeout, to report a problem, someone know how to set a time limit when this happen to avoid to use all time of a lambda timeout to close the connection?

23 Replies
hello @masterhugo what version of the python driver are you using?
Gremlin python 3.7.1
AWS Neptune 1.3.1.0
Looking at the reference documentation for python driver configuration you could try to customize the
transport_factory
which has some timeout settings
I'm not sure if the read_timeout
would apply to connection timeout, which is what I believe you are looking for specifically
It seems the java driver has a connection timeout setting but I do not see one for the python driverI tried that read_timeout but it doesn’t work, still continue waiting to max time lambda
And I tried to send different attributes like timeout from aiohttp but did the same behavior
Are you referring to manually customizing the aiohttp.ClientSession's timeout configuration ? Curious how you set this value? I see examples documented here
I hope it will change it, I sent timeout value to the driverremoteconnection as a param but it didn’t work 🥹
this is my connection code
From what I can tell I don't see a way to set this via DriverRemoteConnection and it would require a python driver code change
this configuration made to response after 15 seconds, but the problem continue because the lambda still waits even the connection return error.


i'd like to be sure i understand the problem here. is the problem that the driver times out at a particular point but then somehow the lambda continues to wait doing nothing until it hits its timeout period?
That’s right, and it’s only after I tried to call the gremlin queries with the database disabled
Just for clarity, when you say "disabled" is the cluster in a Stopped state? or did you delete all instances from the cluster? Or is the cluster completely deleted? instance being rebooted?
Stopped state
In this case is a Neptune Serverless with one instance
And to further clarify, you mean you actually stopped the cluster using the start/stop API (https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-stop-start.html) not that the Serverless instance had scaled to 1? (Just trying to get a clearer picture of what might be going on here).
Stopping and starting an Amazon Neptune DB cluster - Amazon Neptune
Stop and start all DB instances in an Amazon Neptune cluster at once.
I stopped the cluster in the console just to validate this case, in production the behavior is the same but it only happens when I run multiple upserts and the database freezes with this due to high demand, so I replicated the same behavior with stopping the database, that's why I made it stop the cluster.
curious what your lambda code looks like? is it possible there is some retry mechanism happening to attempt to reconnect when a connection failure is detected? The documented python driver example has such a mechanism.
AWS Lambda function examples for Amazon Neptune - Amazon Neptune
The following example AWS Lambda functions, written in Java, JavaScript and Python, illustrate upserting a single vertex with a randomly generated ID using the fold().coalesce().unfold() idiom.
I removed all backoff i have, but the problem persist
here is my code
https://github.com/masterhugo/privateCodes/blob/main/NeptuneAdapter.py
i made public the repo 🙈
Regarding the driver timeout config I think I was wrong about not being able to configure the connection timeout specifically - this kind of config might be possible:
Regarding the lambda still waiting for full timeout after connection error I am not very familiar with lambda retry/timeout logic but would it help to raise an error if a connection error is detected? something like:
Also can try configuring the lambda to reduce the number of reties or max event age
Configuring error handling settings for Lambda asynchronous invocat...
You can use the AWS CLI or the Lambda console to configure how Lambda handles errors and retries for your function when you invoke it asynchronously.
Hmm I’m gonna try, but the problem appears after I call the query instead of the creation of the connection, I suppose that there is where the connection is going to start
My problem persist but because its the timeout connection still wait like 15 seconds even when i set ClientTimeout with loew values, like 1 or 2
and looking on gremlin-python library, if I send the ClientTimeout, it doesn't send on parameters in ws_connect as a kwargs

🥹
The top line should still pass the different args into the aiohttp client though? The subsequent ifs are just mapping the driver specific names to aiohttp specific ones, but shouldn't affect the rest.
But yea not entirely sure if there's much else to do in the driver that can help, might just need to add some manual checks to throw errors in the lamba itself?
Also, @Lyndon would you have any idea around the transport code since you've worked on it?