Radosław
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Yyyyy I don't know if there is a plugin for that but maybe @Łukasz Kita will know something more or suggest some solution for your problem.
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
No problem
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Pretty much you should implement callback
handle_pad_removed
. Like for example in sip_endpoint https://github.com/jellyfish-dev/membrane_rtc_engine/blob/63848ed2b271fa59c1f42a796997f5cfc7958502/sip/lib/sip_endpoint.ex#L277-L290. In this callback you try to remove all membrane elements that are linked to removed pad, because Membrane doesn't support dandling elements.53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
I think the easiest way for that would be to use FileEndpoint which would imitate two peers.
Example tests for file endpoint https://github.com/jellyfish-dev/membrane_rtc_engine/blob/e1d5c4d09f8dd1755017283d104ec84167ed2ca1/file/test/file_endpoint_test.exs#L54
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Yup, you will get data from the point that you last demanded, but remember that you have to demand at the proper speed. If you fall behind too much you will receive an error named
ToiletOverflow
. It simply informs that buffers stored and waiting for your demand exceeded the limit.53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
To get timestamps you can try to add raw_audio_parser, before scribe_endpoint. It should add timestamps. https://github.com/membraneframework/membrane_raw_audio_parser_plugin/tree/master
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
I think you could start from moving this two operations to
handle_setup
:
https://github.com/Jdyn/membrane_scribe/blob/380a114ad14832c658448ae5a5c3a142f954c4ac/lib/scribe/filter.ex#L36-L3753 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
You can try to find out operation which took very long time and if they happens in
handle_init
you can try to move them to handle_setup
.53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
It looks to me like some callback is taking too much time for some reason.
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
You can try to use it. Poorly, I am not up to date with current state of Bumblebee 😅.
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Yup the linked element should work, we use it in SIP Endpoint and HLS Endpoint and it work in realtime
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Yup I thought that this would be a good idea, but maybe it won't be in your use case. The main pros of this approach is that you have one instance of the ML model, so you won't have to handle the fighting for the resources of this models. Also it is simple in implementation as you have less elements.
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
In case of one endpoint it would look like I mention before:
multiple_audio_inputs -> audio_mixer -> speech_to_text -> gpt -> text_to_speech -> bin_output_pad
This maybe will have to be seperated a little bit (because of a latency (you will wait to long for response from gpt)):
multiple_audio_inputs -> audio_mixer -> speech_to_text -> gpt_endpoint
gpt_endpoint -> text_to_speech -> bin_output_pad
But this still would be in the same endpoint.53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
This visualization is a simplified view and lacks some element e.g
TrackSender
, TrackReceiver
etc. But it's goal is to give you a little bit better understanding.53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
To visualize it when you have two endpoints, first will look like this:
input_pads_from_rtc_engine -> audio_mixer -> speech_to_text -> gpt_sink (GPT sink because the the end of the branch of the pipeline is in most cases some element which implements
Membrane.Sink
)
The second one will look like this:
gpt_source
-> text_to_speech -> bin_output_pad53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
You will use
Membrane.Source
in your second endpoint (this which transforms textual input to a voice audio), and the output is pretty much depends how do you imagine that 😅 . An example of Membrane.Source
is Membrane.File.Source (https://github.com/membraneframework/membrane_file_plugin/blob/master/lib/membrane_file/source.ex).
And you will need this in that case because you need a source of the stream to pass to rtc_engine and in most cases the source of the stream is some element which use Membrane.Source
. (There is Membrane.Endpoint
which is both a Sink
and Source
, but you shouldn't think about it currently).53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Also if you create two endpoints you will have to create another element which will be
Membrane.Source
which will create a Membrane.Buffer
from responses and maybe transform it a little bit.53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Both approach sounds valid and in the end this is a implementation detail. I think you can start with an endpoint that will create a timelined transcript and then decide if it is easier to extend this endpoint or create a seperate endpoint. The benefit of one endpoint I see is that you encapsulate this whole functionality in one process and you won't have to deal with problems regarding synchronization e.g: (how the second endpoint (response_to_speech) will know that the response is created (polling doesn't sounds as a optimal idea), what happen when first endpoint for some reason crash should the second also etc.)
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
For doing a speech_to_text and text_to_speech with use of bumblebee whisper you will have to probably create your custom Membrane.Filters.
53 replies
SMSoftware Mansion
•Created by Jdyn on 3/11/2024 in #membrane-help
Developing an advanced Jellyfish use case
Hi, to add this functionality to Jellyfish you will have to pretty much add a new component to jellyfish. Under the hood each component maps to Membrane.RTC.Engine endpoints, so you will have to create one too. Here we have a documentation for creating a custom endpoint in
rtc_engine
:
https://hexdocs.pm/membrane_rtc_engine/custom_endpoints.html
Each rtc_engine
endpoint is Membrane.Bin
where you will create a part of the pipeline. The simplified description how you pipeline would looks like:
multiple_audio_inputs -> audio_mixer -> speech_to_text -> gpt -> text_to_speech -> bin_output_pad
During implementing all of that you can look for inspiration/reference to different endpoints which both subscribe on tracks and publish tracks. At the moment these endpoints are: WebRTC and SIP. Similarly with creating your custom component you should reference other components e.g: SIP.53 replies