Neptune Cluster Balancing Configuration
I'm trying to reach 40 rps in the registration flow, currently I'm reaching 20 rps. I have 6 instances of a python application in fastapi and I notice that each interaction takes around 200-400ms to communicate with neptune, for query flows I notice that there is a bottleneck where some of these queries take around 200ms-1s. about the cluster and the application, both are in the same region and with the same vpc. When analyzing the cluster I notice that some instances are having more CPU consumption than others. What could I do to improve this?
Solution:Jump to solution
What are you using to send queries to Neptune? Are you using the gremlin-python client and connecting via websockets? If so, each websocket connection is going to act like a "sticky session". It will connect to the same instance for the life of the connection.
The reader endpoint is a DNS endpoint that is configured to resolve to a different read replica approximately every 5 seconds. So depending on when you establish your websocket connections or if you're just sending http requests, those could all go to the same instance if sent in quick succession.
Customers have solved this in a number of ways. Some will create load balancers in front of Neptune read replicas that can more precisely "load balance" requests across the instances....
GitHub
GitHub - aws/neptune-gremlin-client: A Java Gremlin client for Amaz...
A Java Gremlin client for Amazon Neptune that allows you to change the endpoints used by the client as it is running. - aws/neptune-gremlin-client
3 Replies
Solution
What are you using to send queries to Neptune? Are you using the gremlin-python client and connecting via websockets? If so, each websocket connection is going to act like a "sticky session". It will connect to the same instance for the life of the connection.
The reader endpoint is a DNS endpoint that is configured to resolve to a different read replica approximately every 5 seconds. So depending on when you establish your websocket connections or if you're just sending http requests, those could all go to the same instance if sent in quick succession.
Customers have solved this in a number of ways. Some will create load balancers in front of Neptune read replicas that can more precisely "load balance" requests across the instances.
We also created a version of the Gremlin Java client that establishes connection pools across multiple reader instances: https://github.com/aws/neptune-gremlin-client
Doing this in Python with the Gremlin Python client is not as straight-forward than the Java client. The Java client has the concept of a "cluster" whereas the Python client does not. So you may need to build a routing mechanism that creates connections to each reader instance directly (each instance has it's own instance endpoint) and iterate across those connections in a round-robin fashion if you want to get even distribution.
Totally understand this is a pain and we've been discussing ways to address this.
GitHub
GitHub - aws/neptune-gremlin-client: A Java Gremlin client for Amaz...
A Java Gremlin client for Amazon Neptune that allows you to change the endpoints used by the client as it is running. - aws/neptune-gremlin-client
Im using this implementation in Python:
client_read = client.Client(ws://url_neptune_read:8182/gremlin, "g", transport_factory=lambda: AiohttpTransport(call_from_event_loop=True), message_serializer=serializer.GraphSONMessageSerializer())
client_write = client.Client(ws://url_neptune_write:8182/gremlin, "g", transport_factory=lambda: AiohttpTransport(call_from_event_loop=True), message_serializer=serializer.GraphSONMessageSerializer())
client_read.submit("query_read").all().result()
client_write.submit("query_write").all().result()
And thanks, Im gonna look this doc.
Yes, so that is using websockets (although, if you're using Neptune, the connection string should start with
wss
as Neptune is SSL/TLS only).