Pipeline Error: Pipeline Failed to Terminate within Timeout (5000ms)
This is a bit of a head scratcher for me. I'm in the process of writing a new element for my pipeline that uses the Silero VAD module for speech detection (rather than the built in WebRTC Engine VAD extension). I've got it working, but hitting a wierd bug. Now when my engine terminates (peer leaves), I'm getting this error:
** (Membrane.PipelineError) Pipeline #PID<0.1499.0> hasn't terminated within given timeout (5000 ms).
The only thing that's changed is my new element in the pipeline (it's setup as a Membrane.Filter). If I remove the element from the pipeline, then the error goes away.
I could obviously bump the timeout, but before I do that I thought to ask for advice. Why would adding an element increase the engine shutdown time? What's the right way to dig into this?
11 Replies
I'm loading the Ortex model within the element init and storing it within the state
model = Ortex.load(@model_path)
But beyond that, everything else seems pretty straight forward logic / state management (if prob > threshold blah, etc.)Termination of the whole pipline is postopned to the moment when all bins and elements are terminated, so if your custom VAD element doesn't terminate when it should, it can delay termination of the whole pipeline
How should I go about determining why my element is taking too long to terminate? I don’t have a handle_terminate callback defined.
You can just spawn a new
Task
in handle_init
in your custom element. This task should Process.monitor
the element and dbg(some message)
after the element dies. Something like this should tell us if termination of this element is the problem or notI didn't do that, but I did just try;
I get a VAD shutdown time of 0ms, and then 5000ms later, I get the error message. Unless this test isn't reliable
Ahh. Interesting, I did your method and got an entirely different result.
handle_terminate_request
is executed while the element process is still alive and an element might live for a while after this callback ends.
Ahh. Interesting, I did your method and got an entirely different result.It means that probably the new element is responsible for the pipeline termination delay. It is hard for me to say something more without the code of the whole element
Here's a gist with the module. I was thinking of open sourcing it, if I could get it stable.
https://gist.github.com/Tonyhaenn/66c9148b2ae73a5009894250a0b6f6d7
It's stable insofar as it works. The delayed shutdown is just puzzling. I'm guessing it's Ortex + the use of Rustler. Though I've looked through those docs and the code, and I didn't see an "unload" or cleanup type function.
Though tellingly, if I replace my
do_predict/3
function with a dummy function like so:
I don't get the same timeout failure. So it's defintely that Ortex.run/2
call.It seems reasonable, that your element enters
Ortex.run
and it stays there for long enough, that default timeout in Pipeline.terminate
is exceededThe inference time is ~1 ms according to my logging.