Process API: Memory Leaks in Windows (?)
Hi all,
I'm developing an Asp.Net solution that allows registering CLI commands which are executed periodically. (for reference: GitHub)
When I'm running the service (as a windows service) with relatively few CLI commands to execute (every 5 seconds) I am experiencing higher and higher memory consumption on the host, to a point where memory+swap (16/32 Gig) are consumed and the PC becomes unusable.
That is, when I have two commands running, one that periodically checks if a winsvc is running (sc.exe query) and one that pings google (for more info, refer to the configuration here).
I tried both using
Process
(Namespace: System.Diagnostics
) and the NuGet Package CliWrap, but the memory leak occurs with both.
Interestingly the opentelemetry reported memory consumption of the service - screenshot attached - remains relatively stable, checking in the task manager directly these numbers line up. The task manager itself cannot account for over half of the memory when the PC is at 100% memory usage.
Some points to note:
- The performance (execution duration) of the individual CLI commands begins to degrade quite fast, getting worse and worse the longer the service runs
- Restarting the windows service (not rebooting the machine) does not free any memory
Questions:
- Is this a known problem of the Process class? I have tried nearly all combinations of sc.exe, tasklist, ping - all run into memory leaks, so I would be somewhat confident to say it is somewhere on C# level (?)
- Can someone give me any pointers how to diagnose where the memory is going? If my Grafana Dashboard (which aligns with the Task Manager) is right, the application itself is consuming relatively few memory (big questionmark here!)
I'd be happy about any kind of advice, cheers and thanks in advance
Olli23 Replies
Additional Points to Note (that I could not add above because of a character limit, lol):
- The only way to free up the memory is a complete reboot of the PC
- The screenshot is a little bit older, so there are more commands in the dashboard showing up than mentioned above, but the behaviour is the same
- I did already some dotmemory analysis and even in longer executions the managed memory seems fine (after some tweaking, especially resetting CancellationTokenSources)
Edit: To establish a different baseline I deployed the same code as a linux service (kali) with just a ping command. It has been running only for 2 hours, but there I see no increase in processing time or increasing memory consumption
When I'm running the service (as a windows service)wait, do you mean memory increases only when run as a service or even as a standalone app? does memory increase even when not running any periodic check? have you tried running just a check in a new empty project to see if it misbehaves?
Sorry for my bad communication there :^)
wait, do you mean memory increases only when run as a service or even as a standalone app?I only run it as a service, I never measured it (over a day or so) when running from the console/via IDE I meant to say with that: "It is running as a windows service" (and not that there is a difference in windows service vs usual CLI execution)
does memory increase even when not running any periodic check?No, it does not. It stays perfectly flat/constant, no memory issues
have you tried running just a check in a new empty project to see if it misbehaves?Can you clarify here? I don't know what you mean, sorry. Edit: Do you mean just running the Process.Start calls (as simple as possible) in an infinite loop?
are you disposing of the Process instance when you are done with it?
Yep.. Well, I did.. CliWrap's Process is not disposable.
As far as I can tell from going through the libraries source code the Process is disposed as expected.
In any case, wouldn't this, without being correctly disposed, show up as increasing managed memory (see dashboard above)
Edit: Trying to find the last version that still used Process for reference
Edit2: Squash merge ftw :/
unless you did a git gc, you can still find old commits in the reflog
https://sethrobertson.github.io/GitFixUm/fixup.html choose your own adventure in git
I found an older state here and I'm fully aware that the using/dispose is missing in that state.
I did add it and let it run for a while (with the disposal), but had no luck.
However, maybe it's worth a shot retrying just plain Process with all the other noise (thousands of CTS recreations) gone
GitHub
wndw.ctl/Oma.WndwCtrl.Core/Executors/Commands/CliCommandExecutor.cs...
Contribute to OlliMartin/wndw.ctl development by creating an account on GitHub.
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.standardoutput?view=net-9.0&redirectedfrom=MSDN#remarks
reading both stdout/stderr is tricky
Process.StandardOutput Property (System.Diagnostics)
Gets a stream used to read the textual output of the application.
Ah - so that explains why sometimes the command would block forever.
I guess I'll give it a shot with proper disposal and (for now) only reading from StdOut
Thanks a ton!
last example shows how to do it. starting async read of stdout, reading stderr to end, waiting for exit
well, actually async stderr and reading stdout to end, but either works
I am not too much optimistic about it though.. But just to be on the safe side that it's not caused by either the library or by incorrect usage of Process.
Edit: Do you mean just running the Process.Start calls (as simple as possible) in an infinite loop?yes
Soo.. I just changed it back to using Process directly and deployed it to the windows service.. let's see how that goes. Code for reference
A sad note: The linux service started consuming more and more memory too, but significantly less than the windows service (over the same time period). [Screenshot attached]
GitHub
wndw.ctl/Oma.WndwCtrl.Core/Executors/Commands/CliCommandExecutor.cs...
Contribute to OlliMartin/wndw.ctl development by creating an account on GitHub.
I will prepare that now and run it once I see the memory consumption go up in the windows service (above version)
Maybe (not too optimistic) the above change already fixes my leak..
But still my question: If the Process itself would be incorrectly handled (not disposed), wouldn't I see that in the managed memory?
what does "cached" in the graph mean? the other bars seems to not change much in width
add available to the graph as well. cached is just that, it will be dropped when something else need the memory
Hmm.. You are right, the stacking confused me..
It's one of the open-telemetry metrics, this is the full list [ref1]
[ref2] shows the graph only with used selected
Edit: Needless to say I dont actually know what it means and I couldnt find anything from the otelcol.
Edit2: Overread the part with
Available
- Do you mean that there's a difference between free and available? I don't really have that information from otelcol.. :/on linux, turn on optional metric system.linux.memory.available
Oh lul, i just found another linux oddity.. my service took ages to start, apparently because the service runs in the root, see here
After fixing it, I now at least have a way more managable memory set [picture]
Before it was between 550-600..
Alright, adding the metric now 🙂
Sebastian Solnica
Debug notes
Slow start of a systemd unit implemented in .NET
In this post I would like to describe my recent experience of troubleshooting a slow-starting systemd unit implemented in .NET. While Linux troubleshooting is still new territory for me, I am gradu…
good to know about the working directory, i'll store that somewhere in the brain for later
Mh, opentel doesn't like the metric (same for max).. I took a look here and other places..
My best guess would be that my otelcol-contrib is not up-to-date enough (installed yesterday :/)
Edit: Tried multiple versions, current log is wrong.. one sec
Edit2: Fixed
Config:
Host metrics receiver — Splunk Observability Cloud documentation
Use this Splunk Observability Cloud integration for the host metrics monitor. See benefits, install, configuration, and metrics.
sort of yes but i don't know what would happen to it after gc gets hold of it
this is a question for #allow-unsafe-blocks
Short update.. I rolled back to using Process directly, it seems that the performance degredation got worse [Picture attached]. Currently running code is here.
I just thought of doing the opposite of calling Process.Start alone a bunch of times:
I will change the code of
CliCommandExecutor
to just lookup what the CLI execution would return from a hardcoded list and see how that behaves.So after ~2.5 hours there is at least no sign in execution duration increasing [pic attached]
However, when it comes to replication.. I'm unsure. I'm returning the same string(s) all the time, which are loaded from a
Dictionary<string, string>
without modification.
I considered appending something like a GUID to the returned string to have a new instance.
But then again, I never understood/cared to understand how strings work in C#..
Edit: Double checked directly on the PC as well, Task Manager reporting steady 34% ram usage.. let me run that for a while..