Is NoHostAvailableException losing/not including relevant error context (3.7.0 above)?

Hey folks, I've recently noticed that G.V() is "not as good" as it used to be in reporting some specific connectivity issue, and upon further investigation managed to attribute this to a change in behaviour with NoHostAvailableException seemingly losing some relevant context. In my case, I'm testing the scenario of trying to connect to Azure Cosmos DB from an IP that is not allowed through their firewall, which usually results in a bespoke error message for Azure going as follows:
Invalid handshake response getStatus: 403 Request originated from IP 213.31.211.185 through public internet. This is blocked by your Cosmos DB account firewall settings. More info: https://aka.ms/cosmosdb-tsg-forbidden
Invalid handshake response getStatus: 403 Request originated from IP 213.31.211.185 through public internet. This is blocked by your Cosmos DB account firewall settings. More info: https://aka.ms/cosmosdb-tsg-forbidden
Attempting to submit a query to a Cosmos DB endpoint protected by a firewall results in the expected NoHostAvailableException, but the error's detailedMessage only states:
"All hosts are considered unavailable due to previous exceptions. Check the error log to find the actual reason."
"All hosts are considered unavailable due to previous exceptions. Check the error log to find the actual reason."
The rootCause for the thrown exception is itself a NoHostAvailableException and does not contain any reference to the actual error message returned when attempting the connection, which is instead output separately by the driver before the NHA exception is thrown - "I'm assuming this is what the check the error log to find the actual reason message refers to" Is there a way to get this downstream failure from the NHA exception? I've seen cases where it seemingly works (e.g. SecurityException from invalid credentials, serialization issue, etc) and others where the root cause seems to just get lost. For reference I've attached what's output by the driver prior to throwing the exception (lengthy stack trace alert).
Solution:
i dont think we've changed any behavior for NoHostAvailableException since 3.5.5: https://tinkerpop.apache.org/docs/current/upgrade/#_gremlin_driver_host_availability Since that time there is really only one way that an NHA is thrown: if the connection pool cannot initialize a connection to any host. we are selective in what exceptions are raised within the NHA because there are cases where the exception can be more confusing than helpful. in this case, we weren't including handshake exception...
GitHub
Improved error messaging for NHA · apache/tinkerpop@a37e93f
Added another exception type to those than can be raised as a cause of NHA CTR
Jump to solution
2 Replies
Solution
spmallette
spmallette8mo ago
i dont think we've changed any behavior for NoHostAvailableException since 3.5.5: https://tinkerpop.apache.org/docs/current/upgrade/#_gremlin_driver_host_availability Since that time there is really only one way that an NHA is thrown: if the connection pool cannot initialize a connection to any host. we are selective in what exceptions are raised within the NHA because there are cases where the exception can be more confusing than helpful. in this case, we weren't including handshake exceptions which was the cause in your log output. that seems like a sensible thing to include so i quickly added that: https://github.com/apache/tinkerpop/commit/a37e93f3c3b3c1b404c34ecf77ac05a0d959e046 thanks for bringing that up. cc/ @Kennh
GitHub
Improved error messaging for NHA · apache/tinkerpop@a37e93f
Added another exception type to those than can be raised as a cause of NHA CTR
gdotv
gdotvOP8mo ago
I think that's exactly what I need here, awesome! I'll close this out and wait for the next tinkerpop release

Did you find this page helpful?