Error: `Failed to authenticate`, when connection pool size is >1 for GremlinServer with ArcadeDB
Hi Friends,
I am exploring and evaluating ArcadeDB.
The DB is setup with GremlinServerPlugin to expose GremlinServer at port 8182.
Using SpringBoot to create client app.
@Bean
Cluster cluster() {
cluster = Cluster.build().port(arcadedbServerPort).addContactPoint(arcadedbServerHost)
.credentials(arcadedbServerUserName, arcadedbServerPassword).create();
return cluster;
}
@Bean
GraphTraversalSource g(Cluster cluster) {
GraphTraversalSource g = traversal().withRemote(DriverRemoteConnection.using(cluster));
logger.debug("Graph Features=>: {}", g.getGraph());
// Using this line to create the conenction pool during server start up and to avoid creation during the first actual request.
logger.debug("Graph Total Vertex=>: {}", g.V().count().toList());
return g;
}
@PreDestroy
public void cleanArcadedbGraphCluster() {
if (cluster != null) {
cluster.close();
}
The GraphTraversalSource object is created once and used across the application.
When tried to load test(using default gremlin driver Cluster settings for connection pool (min 2 , max 8)) starting with 500req/sec, getting below error and around 80% requests failed.
Error: org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failed to authenticate
The only credentials configured is at the ArcadeDB first start up and with these credentials am able to initialize Cluster.
The simple logic : g.V().hasLabel("Category").count() is been run in the request
The only way I could solve this was by increasing the connection pool closer to concurrent requests, but may not be the ideal approach for a throughput for 1 million req/sec(optimistic)
If the connection pool is set to (min 1 , max 1) all requests succeeds with each request latency has the effect of the load but no authentication issue, connection pool > 1 failed to authenticate.
Requesting your insights.5 Replies
I'm not sure how authentication works for ArcadeDB. I assume the "Failed to authenticate" is the client side error. Is there any server-side output that might give some hints as to what is happening? Perhaps some stack traces printed to the console or server logs (not sure how ArcadeDB outputs things)? Perhaps someone from @arcadedb can offer insight on this one?
Adding more details of the issue:
Cluster settings for connection pool = (min 2 , max 8))
In high load. E.g.: 1000 request/sec below log is written:
INFO 116840 --- [gremlin-driver-conn-scheduler-1] o.a.tinkerpop.gremlin.driver.Connection : Created new connection for ws://ec2-**.ap-southeast-2.compute.amazonaws.com:8182/gremlin
INFO 116840 --- [gremlin-driver-conn-scheduler-2] o.a.tinkerpop.gremlin.driver.Connection : Created new connection for ws://ec2-**.ap-southeast-2.compute.amazonaws.com:8182/gremlin
INFO 116840 --- [gremlin-driver-host-scheduler-1] o.a.t.gremlin.driver.ConnectionPool : Opening connection pool on Host{address=ec2-**.ap-southeast-2.compute.amazonaws.com/**:8182, hostUri=ws://ec2-**.ap-southeast-2.compute.amazonaws.com:8182/gremlin} with core size of 2
INFO 33720 --- [gremlin-driver-worker-19] o.a.t.gremlin.driver.ConnectionPool : Replace Connection{host=Host{address=ec2-**.ap-southeast-2.compute.amazonaws.com/**:8182, hostUri=ws://ec2-**.ap-southeast-2.compute.amazonaws.com:8182/gremlin}}, {channel=254bbe33}
Following up after this, will get the error:
org.apache.tinkerpop.gremlin.driver.exception.ResponseException: Failed to authenticate
at org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler.channelRead0(Handler.java:246) ~[gremlin-driver-3.7.0.jar:3.7.0]
at org.apache.tinkerpop.gremlin.driver.Handler$GremlinResponseHandler.channelRead0(Handler.java:201) ~[gremlin-driver-3.7.0.jar:3.7.0]
Upon checking the error in gremlin source code, I could see that its originating from gremlin server class: SaslAuthenticationHandler when the below condition is false:
if (requestMessage.getOp().equals(Tokens.OPS_AUTHENTICATION) && requestMessage.getArgs().containsKey(Tokens.ARGS_SASL))
Note:- Is there a configuration tuning towards Cluster recommended(rule of thump) for high concurrent loads if that has affect on the above issue?
@arcadedb completely relies on the Gremlin Server for authentication and transport. That error seems related to the Gremlin Server. Maybe a concurrency issue with high load?
Hi @spmallette , do we have a workaround on the above issue? because I get the same issue with other tinkerpop enabled DB's as well. So It seems to be tied with Gremlin.
I could see a similar issue thread: https://issues.apache.org/jira/browse/TINKERPOP-2132, https://issues.apache.org/jira/browse/TINKERPOP-2205
if it is the same problem as TINKERPOP-2132, then judging from my comments, there's not much that can be done for a workaround beyond what was described there. cc/ @Kennh