C
C#10mo ago
Noah;

Memory leak problem with System.Speech.Recognition && Microsoft.Speech.Recognition

Description: When running the speech recognition using Microsoft.Speech.Recognition, there appears to be a memory leak issue. The recognition engine constantly grows in memory during execution, even though it is disposed of correctly. The problem is specifically observed in the following code block:
while (!stoppingToken.IsCancellationRequested && !disposed)
{
Thread.Sleep(100);
try
{
int read = originStream.Read(buffer, 0, 48000);
bufferedByteStream.Write(buffer, 0, read);
}
catch (IOException ex) when (ex.InnerException is SocketException { SocketErrorCode: SocketError.ConnectionReset })
{
Console.WriteLine("Connection was forcibly closed by the remote host.");
bufferedByteStream.Close();
disposed = true;
}
}
while (!stoppingToken.IsCancellationRequested && !disposed)
{
Thread.Sleep(100);
try
{
int read = originStream.Read(buffer, 0, 48000);
bufferedByteStream.Write(buffer, 0, read);
}
catch (IOException ex) when (ex.InnerException is SocketException { SocketErrorCode: SocketError.ConnectionReset })
{
Console.WriteLine("Connection was forcibly closed by the remote host.");
bufferedByteStream.Close();
disposed = true;
}
}
Context: - The recognition engine grows in memory while it runs in a separate thread. - Disposing of the engine at the end of the execution does not resolve the issue. - The memory growth is evident even with correct disposal practices. Additional Information: - The issue happens withMicrosoft.Speech.Recognition and is also observed in System.Speech.Recognition. - The code is part of a background service and runs continuously. Code Snippet for Reference:
// ... (previous code)

using (SpeechRecognitionEngine engine = SREBuilder.Create(new[] { "1FriendlyDoge", "notanoob600m", "RoyalCrests" }))
{
engine.SpeechRecognized += (sender, e) => HandleSpeechRecognized(sender, e, userId, guildId);
engine.SetInputToAudioStream(bufferedByteStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 48000, 16, 2, 192000, 4, null));
engine.RecognizeAsync(RecognizeMode.Multiple);

while (!stoppingToken.IsCancellationRequested && !disposed)
{
Thread.Sleep(100);
try
{
int read = originStream.Read(buffer, 0, 48000);
bufferedByteStream.Write(buffer, 0, read);
}
catch (IOException ex) when (ex.InnerException is SocketException { SocketErrorCode: SocketError.ConnectionReset })
{
Console.WriteLine("Connection was forcibly closed by the remote host.");
bufferedByteStream.Close();
disposed = true;
}
}
}
// ... (previous code)

using (SpeechRecognitionEngine engine = SREBuilder.Create(new[] { "1FriendlyDoge", "notanoob600m", "RoyalCrests" }))
{
engine.SpeechRecognized += (sender, e) => HandleSpeechRecognized(sender, e, userId, guildId);
engine.SetInputToAudioStream(bufferedByteStream, new SpeechAudioFormatInfo(EncodingFormat.Pcm, 48000, 16, 2, 192000, 4, null));
engine.RecognizeAsync(RecognizeMode.Multiple);

while (!stoppingToken.IsCancellationRequested && !disposed)
{
Thread.Sleep(100);
try
{
int read = originStream.Read(buffer, 0, 48000);
bufferedByteStream.Write(buffer, 0, read);
}
catch (IOException ex) when (ex.InnerException is SocketException { SocketErrorCode: SocketError.ConnectionReset })
{
Console.WriteLine("Connection was forcibly closed by the remote host.");
bufferedByteStream.Close();
disposed = true;
}
}
}
12 Replies
Noah;
Noah;OP10mo ago
(I have isolated this and profiled it & have determined the issue does not appear to be my code) For some additional information, the buffered steam is just a list of bytes with a predefined size (48,000 bytes in this case, which is 0.25s of audio data) and its size never changes. When trying to profile the issue using dotMemory the issue seems to be a ton of Byte[] object with a stack trace of literally just [AllThreadsRoot]. The exception gets called when a socket connection closes (each socket connection has its own instance of SpeechRecognitionEngine) and only happens once per instance. After that everything is being disposed.
Omnissiah
Omnissiah10mo ago
probably it's not related to the problem, but why don't you async? also i would check if read == 0 for exiting the while
1FriendlyDoge
1FriendlyDoge10mo ago
I removed all async stuff for debugging But it didnt help The client might not constantly send something but it still needs to stay connected
Omnissiah
Omnissiah10mo ago
i don't know, i personally use gcs speech recognition and azure speech recognition and don't have these issues, i know it's not exactly the same thing 0 means exit if you receive a 0 you have to restart the connection
1FriendlyDoge
1FriendlyDoge10mo ago
Why lol Since when
Omnissiah
Omnissiah10mo ago
that's the standard behavior of a stream interface you can look on msdn
1FriendlyDoge
1FriendlyDoge10mo ago
Just because a stream is empty doesnt mean I have to close it
Omnissiah
Omnissiah10mo ago
an empty stream would not return 0 bytes read again, this probably is not the problem, just saying that if originStream inherits from Stream then it would be better if you checked for return 0/end stream if not, it will throw an error next time you try to read anyway, but it wouldn't be a gracefully wait to exit
1FriendlyDoge
1FriendlyDoge10mo ago
How would that error I am pretty sure that I used delays between identification and actual data transfer on my socket client for testing and never got any exception because of that
Noah;
Noah;OP10mo ago
Found some people doing the same thing as me online. https://stackoverflow.com/questions/68106636/using-a-memorystream-with-nets-system-speech-speechrecognitionengine-class https://stackoverflow.com/questions/1682902/streaming-input-to-system-speech-recognition-speechrecognitionengine?rq=4 They don't seem to be having any issues... but I can't find anything that they're doing that I am not..
Stack Overflow
Using a MemoryStream with .NET's System.Speech SpeechRecognitionEng...
I am trying to use .NET's System.Speech SpeechRecognitionEngine object to recognize words spoken by a discord user in a voice channel. The raw pcm audio received by the bot is written to a MemorySt...
Stack Overflow
Streaming input to System.Speech.Recognition.SpeechRecognitionEngine
I am trying to do "streaming" speech recognition in C# from a TCP socket. The problem I am having is that SpeechRecognitionEngine.SetInputToAudioStream() seems to require a Stream of a defined length
Noah;
Noah;OP10mo ago
For what it's worth, the full code:
Noah;
Noah;OP10mo ago
Still very stumped on this. All help is appreciated :) just hoping this isnt impossible to fix solved
Want results from more Discord servers?
Add your server