High WebRTC CPU consumption

Hey! We are performing some benchmarks with ExWebRTC-based pipeline and on 4 CPU droplet (with dedicated cores) and 10 incoming streams consumes 100% of CPU. Is that expected behaviour? Our pipeline is essentially consuming H264 video and decodes the AAC to Opus, nothing else. The second question is - how do we trace the membrane elements CPU consumption? I was trying to https://hexdocs.pm/membrane_core/Membrane.Pipeline.html#module-visualizing-the-supervision-tree, but I doesn't see the pipeline at all in :observer. The Live dashboard on the other hand lists the processes and I can sort by "Number of Reductions", but over there all elements is just Membrane.Core.Element so it's kind of makes it impossible to distinguish which process causes the most CPU consumption. Please help 🙏
11 Replies
odingrail
odingrailOP7d ago
Also, it is unclear how to consume the telemetry metrics even after reading the documentation(. I'm trying this and I got 0 events reported with working pipeline, what is the issue here?:
summary("membrane.http_adaptive_streaming_parser.handle_buffer.start", unit: {:native, :millisecond})
summary("membrane.http_adaptive_streaming_parser.handle_buffer.start", unit: {:native, :millisecond})
Also, my config.exs:
config :membrane_core,
telemetry_flags: [
tracked_callbacks: [
bin: :all,
element: :all,
pipeline: :all
]
]

config :membrane_telemetry_metrics, enabled: true
config :membrane_core,
telemetry_flags: [
tracked_callbacks: [
bin: :all,
element: :all,
pipeline: :all
]
]

config :membrane_telemetry_metrics, enabled: true
The docs suggested the format:
[:membrane, :element | :bin | :pipeline, callback, :start | :stop | :exception]
[:membrane, :element | :bin | :pipeline, callback, :start | :stop | :exception]
varsill
varsill7d ago
Hello! Definitely something is suspicious with that high CPU consumption for 10 streams and it requires some inspection. I assume that you don't do any transcoding of the video track, do you? Concerning the CPU usage monitoring, there are 2 places I would inspect: 1. use :observer, find a process with suspiciously high number of reductions, double click on that process, visit Dictionary tab and read the membrane_path entry. (Side note: we plan to set label (https://hexdocs.pm/elixir/Process.html#set_label/1) for the element processes to their membrane_path but it's not available yet so as of now I am afraid you need to read membrane_path manually.) 2. see top -H -p <pid> (where <pid> is the OS process of BEAM) and inspect if some OS threads are using extraordinarily high amount of CPU. There are two options here: either these are some "worker" threads spawned in NIFs (then you should see high CPU usage for some custom threads) or NIFs themselves are using much CPU (then you should see that erts_dcpus_* or erts_sched_*threads are using much CPU)
varsill
varsill7d ago
Concerning telemetry events, I think it might have something to do with a recent bug report. It turns out that by mistake the second element of the "suggested format" you mentioned is the component's module instead of the "component type" (by "component type" I mean :element, :bin or :pipeline). We fixed it here: https://github.com/membraneframework/membrane_core/pull/958 and we will soon release v1.2.3 with that change. As of now could you check if telemetry is working for you with membrane_core fixed on 957-bugfix-component-type-v-component-module branch?
Feliks
Feliks4d ago
@odingrail have you tried recompiling all dependencies? and there is no :telemetry event named "membrane.http_adaptive_streaming_parser.handle_buffer.start" All Membrane :telemetry events names match the convention [:membrane, :element | :bin | :pipeline, callback, :start | :stop | :exception], more docs are there: https://hexdocs.pm/membrane_core/Membrane.Telemetry.html to get events from a specific module, you can filter on telemetry_event_metadata.callback_context.module - telemetry metadata type is https://hexdocs.pm/membrane_core/Membrane.Telemetry.html#t:callback_span_metadata/0 What do you mean by ExWebRTC-based pipeline? Do you use membrane_webrtc_plugin?
odingrail
odingrailOP4d ago
Hey! Thank you for detalied answers @Feliks and @varsill ! I was able to see which processes consumes most resources by using this config: https://github.com/membraneframework/membrane_core/pull/960 in this way I see the labels in the LiveDashboard and in :observer. But from this info I don't see any anomalies in resource consumption. We are using the membrane_webrtc_plugin, also we are converting the audio from AAC to Opus, and doesn't do any transcoding for this benchmarks.
No description
odingrail
odingrailOP4d ago
We are streaming RTSP with TCP transport, but similar resource consumption is also noted with RTMP streaming. We are trying to do the WebRTC distributed streaming. The architecture is roughtly the following: the have some node that consumes the stream (running the webrtc_sink and other elements), and then we are sending packets to "Room" process, which in turns distributes the packets accorss the peers, for that we are using the Phoenix PubSub.
odingrail
odingrailOP4d ago
Regarding this debug technique you described:
top -H -p <pid> (where <pid> is the OS process of BEAM) and inspect if some OS threads are using extraordinarily high amount of CPU. There are two options here: either these are some "worker" threads spawned in NIFs (then you should see high CPU usage for some custom threads) or NIFs themselves are using much CPU (then you should see that ertsdcpus* or ertssched*threads are using much CPU)
Hmm,
No description
varsill
varsill3d ago
Hello! Oh, I've forgotten about that unsafely_name_processes_for_observer config option, it has one main drawback as it works only for a single node, but it's great for debugging. Also thanks for spotting the bug in the documentation! Concerning the results from top, it's definitely odd that the main thread is using that much of CPU. At first sight I would say that it might have something to do with BEAM spending too much time on threads synchronization (which used to be a case with the desired number of schedulers improperly resolved in environments with cgroups limits assigned), but since you are using a droplet I don't think that's an issue. Could you try to gather microstate accounting statistics for let's say a minute of the system running? (https://www.erlang.org/doc/apps/erts/erlang.html#statistics_microstate_accounting) We should be able to see more precisely what BEAM is busy at. If it doesn't tell us much, I am afraid we might need to use perf to see particular calls that use most of the CPU. Concerning erts_sched_ CPU usage you can try experimenting with disabling busy waiting for the schedulers with +sbwt none option (https://www.erlang.org/doc/apps/erts/erl_cmd.html#+sbt) to see (more or less) how much of CPU is indeed spent on code execution.
odingrail
odingrailOP3d ago
Thanks everyone, I think that issue was that in the source element we set the TCP socket to active mode. After switching to the active: size and manual flow control the machine is able to sustain 28 WebRTC streams without significant growth of memory. (The CPU's are on 100%)
varsill
varsill2d ago
Hello, great you got it figured out! Am I right that it was somewhere in your custom source element where the TCP socket was operating in an active mode?
odingrail
odingrailOP2d ago
Hey! Yes, you are right. When we were prototyping the pipeline we just set the socket to active mode and forgot about that 👀 .

Did you find this page helpful?