Gremlin python trying to connect Neptune WS when is down

Im trying to capture and control the exception when the database is down with gremlin but instead the queries or writers wait to max time limit, in this case lambda timeout, to report a problem, someone know how to set a time limit when this happen to avoid to use all time of a lambda timeout to close the connection?
No description
23 Replies
Andrea
Andrea2w ago
hello @masterhugo what version of the python driver are you using?
masterhugo
masterhugoOP2w ago
Gremlin python 3.7.1 AWS Neptune 1.3.1.0
Andrea
Andrea2w ago
Looking at the reference documentation for python driver configuration you could try to customize the transport_factory which has some timeout settings I'm not sure if the read_timeout would apply to connection timeout, which is what I believe you are looking for specifically It seems the java driver has a connection timeout setting but I do not see one for the python driver
masterhugo
masterhugoOP2w ago
I tried that read_timeout but it doesn’t work, still continue waiting to max time lambda And I tried to send different attributes like timeout from aiohttp but did the same behavior
Andrea
Andrea2w ago
Are you referring to manually customizing the aiohttp.ClientSession's timeout configuration ? Curious how you set this value? I see examples documented here
masterhugo
masterhugoOP2w ago
I hope it will change it, I sent timeout value to the driverremoteconnection as a param but it didn’t work 🥹 this is my connection code
driver_remote_connection.DriverRemoteConnection(
pool_size=1,
call_from_event_loop=True,
url=base_url,
message_serializer=serializer.GraphSONSerializersV3d0(),
headers=aws_auth_request,
read_timeout=1,
timeout=1,
)
driver_remote_connection.DriverRemoteConnection(
pool_size=1,
call_from_event_loop=True,
url=base_url,
message_serializer=serializer.GraphSONSerializersV3d0(),
headers=aws_auth_request,
read_timeout=1,
timeout=1,
)
Andrea
Andrea2w ago
From what I can tell I don't see a way to set this via DriverRemoteConnection and it would require a python driver code change
masterhugo
masterhugoOP2w ago
this configuration made to response after 15 seconds, but the problem continue because the lambda still waits even the connection return error.
driver_remote_connection.DriverRemoteConnection(
pool_size=1,
url=base_url,
message_serializer=serializer.GraphSONSerializersV3d0(),
headers=aws_auth_request,
transport_factory=lambda: transport.AiohttpTransport(
read_timeout=5,
heartbeat=2,
call_from_event_loop=True,
),
)
driver_remote_connection.DriverRemoteConnection(
pool_size=1,
url=base_url,
message_serializer=serializer.GraphSONSerializersV3d0(),
headers=aws_auth_request,
transport_factory=lambda: transport.AiohttpTransport(
read_timeout=5,
heartbeat=2,
call_from_event_loop=True,
),
)
No description
masterhugo
masterhugoOP2w ago
No description
spmallette
spmallette2w ago
i'd like to be sure i understand the problem here. is the problem that the driver times out at a particular point but then somehow the lambda continues to wait doing nothing until it hits its timeout period?
masterhugo
masterhugoOP2w ago
That’s right, and it’s only after I tried to call the gremlin queries with the database disabled
triggan
triggan2w ago
Just for clarity, when you say "disabled" is the cluster in a Stopped state? or did you delete all instances from the cluster? Or is the cluster completely deleted? instance being rebooted?
masterhugo
masterhugoOP2w ago
Stopped state In this case is a Neptune Serverless with one instance
triggan
triggan2w ago
And to further clarify, you mean you actually stopped the cluster using the start/stop API (https://docs.aws.amazon.com/neptune/latest/userguide/manage-console-stop-start.html) not that the Serverless instance had scaled to 1? (Just trying to get a clearer picture of what might be going on here).
Stopping and starting an Amazon Neptune DB cluster - Amazon Neptune
Stop and start all DB instances in an Amazon Neptune cluster at once.
masterhugo
masterhugoOP2w ago
I stopped the cluster in the console just to validate this case, in production the behavior is the same but it only happens when I run multiple upserts and the database freezes with this due to high demand, so I replicated the same behavior with stopping the database, that's why I made it stop the cluster.
Andrea
Andrea2w ago
curious what your lambda code looks like? is it possible there is some retry mechanism happening to attempt to reconnect when a connection failure is detected? The documented python driver example has such a mechanism.
AWS Lambda function examples for Amazon Neptune - Amazon Neptune
The following example AWS Lambda functions, written in Java, JavaScript and Python, illustrate upserting a single vertex with a randomly generated ID using the fold().coalesce().unfold() idiom.
masterhugo
masterhugoOP2w ago
I removed all backoff i have, but the problem persist here is my code https://github.com/masterhugo/privateCodes/blob/main/NeptuneAdapter.py i made public the repo 🙈
Andrea
Andrea2w ago
Regarding the driver timeout config I think I was wrong about not being able to configure the connection timeout specifically - this kind of config might be possible:
timeout = ClientTimeout(
total=5, # Total timeout for the connection (connect + read)
connect=2, # Timeout for establishing the connection
read=5, # Timeout for waiting for data after the connection
)
return driver_remote_connection.DriverRemoteConnection(
pool_size=1,
url=base_url,
message_serializer=serializer.GraphSONSerializersV3d0(),
headers=aws_auth_request,
transport_factory=lambda: transport.AiohttpTransport(
timeout=timeout,
heartbeat=2,
call_from_event_loop=True,
),
)`
timeout = ClientTimeout(
total=5, # Total timeout for the connection (connect + read)
connect=2, # Timeout for establishing the connection
read=5, # Timeout for waiting for data after the connection
)
return driver_remote_connection.DriverRemoteConnection(
pool_size=1,
url=base_url,
message_serializer=serializer.GraphSONSerializersV3d0(),
headers=aws_auth_request,
transport_factory=lambda: transport.AiohttpTransport(
timeout=timeout,
heartbeat=2,
call_from_event_loop=True,
),
)`
Andrea
Andrea2w ago
Regarding the lambda still waiting for full timeout after connection error I am not very familiar with lambda retry/timeout logic but would it help to raise an error if a connection error is detected? something like:
try:
NeptuneConnectionManager.create_remote_connection(host, port)
except (client_exceptions.ClientConnectorError, OSError) as e:
raise RuntimeError(f"Connection to Neptune failed: {e}")
try:
NeptuneConnectionManager.create_remote_connection(host, port)
except (client_exceptions.ClientConnectorError, OSError) as e:
raise RuntimeError(f"Connection to Neptune failed: {e}")
Also can try configuring the lambda to reduce the number of reties or max event age
Configuring error handling settings for Lambda asynchronous invocat...
You can use the AWS CLI or the Lambda console to configure how Lambda handles errors and retries for your function when you invoke it asynchronously.
masterhugo
masterhugoOP6d ago
Hmm I’m gonna try, but the problem appears after I call the query instead of the creation of the connection, I suppose that there is where the connection is going to start My problem persist but because its the timeout connection still wait like 15 seconds even when i set ClientTimeout with loew values, like 1 or 2
masterhugo
masterhugoOP6d ago
and looking on gremlin-python library, if I send the ClientTimeout, it doesn't send on parameters in ws_connect as a kwargs
No description
masterhugo
masterhugoOP4d ago
🥹
Yang Xia
Yang Xia4d ago
The top line should still pass the different args into the aiohttp client though? The subsequent ifs are just mapping the driver specific names to aiohttp specific ones, but shouldn't affect the rest. But yea not entirely sure if there's much else to do in the driver that can help, might just need to add some manual checks to throw errors in the lamba itself? Also, @Lyndon would you have any idea around the transport code since you've worked on it?

Did you find this page helpful?