Preventing Janusgraph crash on timeout
According to this: https://stackoverflow.com/questions/61985018/janusgraph-image-stop-responding-after-query-timeout-how-to-prevent-it and my own findings, Janusgraph hard crashes on gremlin timeouts because of the default used berkeley storage handler.
To circumvent this, I'm trying to end my query before the time limit is reached, which seems to work pretty well.
But how can I access the executionTimeout value from a Strategy or Step? Afaik it's only available in the Context, which doesn't get passed down to Strategy and Step.
Greetings
Volker
Stack Overflow
JanusGraph image stop responding after query timeout, how to preven...
When JanusGraph server exceeded his 'evaluationTimeout' the server stop responding
I'm using the default docker image janusgraph/janusgraph:latest (Berkeley and Lucene)
and connecting with gremlin
10 Replies
I'm not sure I understand the context of your question, but elements of the Gremlin language like strategies and steps don't have knowledge of the server execution environment (a potentially interesting idea though).
Situation:
1. Janusgraph with berkeley storage implementation
2. Long running query
3. Timeout value reached (user set or default)
Expected:
Gremlin shuts down, resulting errors can be catched
Actual:
Gremlin shuts down, causing a critical berkeley error, which can't be catched and screws the db completely.
Solution:
Get access to the timeout in the running gremlins. Cancel them automatically before the timeout is reached, thus always resulting in a query that "finishes" on time
Proposition: Give Step and Strategy access to their current execution Context
which contains the timeout value
yes, i think i understood all that, but i'm not clear from where you intend to do that? i don't see how strategies or steps can help you do what you want to do from a user's perspective. are you saying you are modifying JanusGraph somehow to do this?
Yes, I already have custom strategies and steps loaded, so this would be the easiest solution for that. A more general solution would probably be either to fix the janusgraph berkeley problem or to handle a timeout more gracefully on gremlin side (e.g. "killing" all gremlins)
you wrote custom steps too? is that where you intend to try to access the execution time?
yes
or in the strategy, but afaik the executionTimeout is stored in Context, which does not get passed down to them
couldn't you just use a combination of a client-side
evaluationTimeout
and timeLimit()
step? set the evaluationTimeout
on the client-side to X seconds then add a final timeLimit()
step to X - 1 seconds.
g.with('evaluationTimeout', 10000).V().timeLimit(9000)
thank you for this suggestion. i have to confess, that it seems like i missed the timeLimit step completely while searching for a solution and it's pretty close to perfect. the only problem i have with it (and this is a problem specific to my use case) is, that i need to set a few db entries on timeout which should not be controlled by the query. i'm currently using this solution: g.with('evaluationTimeout',5000).V(4320).repeat(out("script", "evaluationTimeout=5000")).until(has(id, lt(0))).sack()
but i would still find it pretty handy to have access to my own execution Context as a Step or Strategy. Whats your opinion on that?
i tend to think that such a feature doesn't serve many people. very few people write custom Gremlin strategies (let alone custom steps) to take advantage of such a feature. Providers might have some use for it, but they also control the whole server and can modify it as they need with complete access to the
Context
. We probably wouldn't even be having this discussion if there wasn't a problem you were trying to work around that is highly specific to JanusGraph, and not even JanusGraph in general but just one implementation of JanusGraph in BerkeleyDB. You've already written custom steps, perhaps it's best to just modify Gremlin Server (i.e. JanusServer) to do what you need where you have full and total control of the entire request/execution/response context.You are probably right, thanks for taking the time