TraversalInterruptionTest taking a very long time to complete

There's a try & sleep step in these tests: https://github.com/tien/tinkerpop/blob/a919b90ebe31434e306609a2d2a37702eead02d7/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalInterruptionTest.java#L86 https://github.com/tien/tinkerpop/blob/a919b90ebe31434e306609a2d2a37702eead02d7/gremlin-test/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TraversalInterruptionComputerTest.java#L78 Which caused the last test case outlined below to take a very long time to complete (several hours to complete on CI) Testcase:
{"g_E_properties", (Function<GraphTraversalSource, GraphTraversal<?,?>>) g -> g.E().properties()},
{"g_E_properties", (Function<GraphTraversalSource, GraphTraversal<?,?>>) g -> g.E().properties()},
The test appear to also work when I remove the try->sleep & run it locally. Can this be removed? It's right now causing Janusgraph CI to take several hours to complete tests for every DB adapter. cc @spmallette
20 Replies
spmallette
spmallette9mo ago
i think it would be better to understand why that's taking so long for JanusGraph as that seems to imply to me that interruption isn't working properly there.
Painguin
PainguinOP9mo ago
The behaviour is very odd 2: There are 10 test cases. If I enable all 10 cases then that last case will always take ages to complete. But if I only enable the last case, sometime it complete almost instantly, while other time it will also take ages. But anyway I think this could be very valuable to take a look at. Could massively skim off development time required for Janusgraph. The test still succeed after removing the sleep&catch step though, at least for Janusgraph. Can we remove it? Or are there cases where that is required?
spmallette
spmallette9mo ago
some graphs will execute those test queries so fast that if there isn't an artificial pause it will just blow through the test data before interruption has a shot of succeeding. i think janusgraph should root cause why g.E().properties() in this context takes hours to complete before modifying the test itself. in this case, it's almost like, janusgraph has the complete opposite situation. if i were debugging it, i'd try to see if Thread.sleep(3000) is even getting called or if these hours of processing are being taken somewhere else. I mean, the Thread.sleep() should only get called once...tha'ts 3 seconds. nothing in this test should equate to hours
Painguin
PainguinOP9mo ago
I did debug it n the Thread.sleep call is getting called multiple times for some reason. btw I tried removing sleep with tinkergraph & it was still successful.
Painguin
PainguinOP9mo ago
I'm using these 3 breakpoint as an example of the sleep being invoked multiple time, you can see that the traverser is different for each time.
No description
No description
No description
spmallette
spmallette9mo ago
it's being hit multiple times in the same test? that's a parameterized test so it will be hit multiple times, but i don't see how it could be hit more than once for a single test execution
Painguin
PainguinOP9mo ago
it's being hit more than one for a single test g.E() -> t.properties() I have no idea why but that's the only affected case look like it will go on to iterate through the entire graph edges -> properties
spmallette
spmallette9mo ago
...i'd have to investigate. i don't know how that is possible
Painguin
PainguinOP9mo ago
Thanks, would massively benefit janusgraph CI
spmallette
spmallette9mo ago
for sure
Painguin
PainguinOP9mo ago
I'm testing this with the berkeleydb adapter btw if that help
spmallette
spmallette9mo ago
does it hit it multiple times for TinkerGraph?
Painguin
PainguinOP9mo ago
nah it doesn't when I test
spmallette
spmallette9mo ago
well, at least that make sense. i don't know why JanusGraph would hit multiple times then, particularly BerkeleyDB my guess is that BerkeleyDB is not rethrowing the interrupt so each iterated property is getting a sleep(3000) i don't think the interrupt is working right for JanusGraph somewhere. if it's not working right, then that will mean JanusServer query timeouts aren't working right
Painguin
PainguinOP9mo ago
Oh wait this might be because of the changes I'm working on I'm adding interupttion support How do you mean by this?
spmallette
spmallette9mo ago
i just mean, it might not be handling threadinterruption properly (which is what this test is designed to catch). it may catch an InterruptionException but then not do the right thing with it. sometimes it needs to rethrow or even reset the interrupt() state:
Each thread has a boolean property that represents its interrupted status. Invoking Thread.interrupt() sets this flag. When a thread checks for an interrupt by invoking the static method Thread.interrupted(), the interrupt status is cleared.
https://www.baeldung.com/java-interrupted-exception
Baeldung
How to Handle InterruptedException in Java | Baeldung
Learn about Java's InterruptedException, what causes it, and how to handle it.
Painguin
PainguinOP9mo ago
Well it look like each traversal was completed without being interrupted? Hence it's going through the whole graph. sorry my knowledge around this is very limited so might be asking dumb question lol Bizarre thing is the test still pass, cuz a TraversalInterruptedException was thrown. It's just that the sideEffect went through the entire graph. So is that behaviour still correct, since it technically did pass the test
spmallette
spmallette9mo ago
cuz a TraversalInterruptedException was thrown
strange. i would think that if it tossed that exception the traversal would have stopped running. not sure how it could keep running, unless, JanusGraph throws it but then doesn't treat that as an error to stop producing traversers
Painguin
PainguinOP9mo ago
If I check the current thread interruption status, it's false? How is this possible?
No description
Painguin
PainguinOP9mo ago
Oh I think I understand how thread & interruption work in Java now. Was too used to the single threaded model from Js :))

Did you find this page helpful?