IO threads - why need them?
Does anybody have a blogpost/article that explains in a deep way how IO threads work (can be .net framework resource).
Can they mix with worker-threads in the execution of a single task that has both IO calls and also does compute-bound work?
79 Replies
$nothread perhaps? @tinmanjk
There Is No Thread
This is an essential truth of async in its purest form: There is no thread.
depends on the platform ๐
I believe on linux for async io that works on top of epoll, there's a separate thread(s) that will do polling
otherwise there should be no .NET thread, yes
I've read it...not really satisfied with it.
Maybe somebody who knows how the internal implementation of the ThreadPool class actually works. A bit above my paygrade to get into the internal workings - tried a bit of DnSpy to get the flow of method calls but still not enough to understand what happens.
Another way to do it is to clone dotnet/runtime repo and browse the ThreadPool.cs and adjacent files.
If you have Rider, you can step through implementation in a quite detailed way too when you do F12 on things like File.ReadAllBytesAsync() and whatnot.
There you will see that in case of Unix the asynchronous read/write will submit an operation that then will be handled by IO thread(s)(?) working with epoll. In case of Windows, it, instead, will be dispatched with overlapped IO which is a native async IO API on Windows (but Unix is way faster with NVME haha).
In case of synchronous operations, most of the time, the p/invoke calls to kernel space will be done directly same as if you would do from C, in which case whether this is being executed by a threadpool worker thread or not does not really matter, since for all intents and purposes, for the threadpool, it just executes some code, whether it does IO or not.
I think that touches on my initial lack of understanding very well.
I tried simulating some IO from within a QueueUserWorkItem and it was executed on the worker thread.
csharp
I was wondering how actually the ThreadPool decided to employ IO threads at all given a callback such as this. A comment to my stackoverflow question for this (https://stackoverflow.com/questions/78066998/how-does-the-threadpool-decide-which-type-of-thread-to-use-for-a-work-item?noredirect=1#comment137626444_78066998) by Hans Passant meant that it was always a worker thread with QueueUserWorkItem.
So, I don't know what criteria are used to classify work items as IO items since in my case IO work is done on the worker thread.
Who employs the IO thread in the first place to do what work is my actual question I guess.
Who employs the IO thread in the first place to do what work is my actual question I guess.The (platform-specific) implementation of IO. By IO here I mean a very specific thing: File API, socket and all kinds of networking API, possibly even IO done entirely inside a user-implemented third-party library.
yes but they do that by using the ThreadPool either directly or via TPL, so the ThreadPool has to make the final decision or am I mistaken?
There's no magic "this thread does IO" vs "this thread does not do IO" thing. It's an umbrella term for what a particular implementation ends up doing.
Thread pool does not know or do that. Threadpool is just a, well, pool of worker threads that execute work items. TPL works on top of that with the default scheduler and somewhat tangential to this.
Let's consider overlapped IO which is an API used on Windows. When you issue an asynchronous read of a particular file, it creates an overlapped IO request for a read that will be done asynchronously.
What this means is that Windows will see the exact paramters you are issuing a read request, where you specify it is asynchronous, which means that it will queue the read and then immediately return back to you. Then, the code for the method doing this io will just return (yield back) to the threadpool since it can no longer progress.
Once Windows finishes the read, it will call the callback .NET has provided within the overlapped IO request which will submit a new work item on the threadpool to continue execution of your code that was awaiting the read.
Whether Windows has its own pool of IO threads or some other way of conducting the read - is an implementation detail.
In comparison, on Linux, you will have a separate thread(s) (disclaimer: I read the code briefly so it is best you double-check) that exist outside of threadpool worker threads that will perform the polling of the read operation.
So when you do an asynchronous read on Linux, your code that is doing an async read and being executed, let's say, by a .NET threadpool worker thread, will submit that to a separate IO thread(s) .NET runtime keeps around to work with epoll, and then immediately return back to threadpool since the execution of your task cannot proceed.
Then, once that IO thread(s) has done reading, somewhat similar to how overlapped IO worked with windows, it will queue a new work item on a threadpool that has a reference to your tasks state machine to continue the execution of your task
thanks a lot for the explanation. I was expecting something like this to really happen.
worker threads offloading work to IO threads to wait/poll and then when triggered by the OS to give the work back to worker threads
this is just a (very rough) example of how File API works - networking may or may not use a completely different implementation.
after all, it is all object references, threads and method calls all the way down
I am not quite sure I understand really - where is this File API implementation? Or internal thread-IO work with files - the CLR source code?
1
2
yeah, can do this with File.ReadAllBytesAsync()
I prefer DnSpy for some reason, but might need to learn to do this properly with the runtime repo
I never used DnSpy but would expect it to be a bit difficult to read
because RandomAccess IO implementation (public
RandomAccess
class that takes in file handles and bytes buffers) uses callbacks and all kinds of object wrappers for IO operations, it is kinda hard to read
but I cannot stress this enough - there is no magic "IO" flag or one-API-to-rule-them-all that the threadpool uses - it literally just executing opaque code 99% of the timeyeah it is, but you can set break points if you know which methods are used and there is multi-thread debugger, so...can happen
so when you have to do some quote on quote IO work - it comes down to implementation of that specific """IO""" thing doing that
I know...but if we dig in the File's class InternalAllReadBytes implementation
https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/IO/File.cs,f266db58b0db4c05,references
it is indeed int n = await RandomAccess.ReadAtOffsetAsync
but I doubt that the ReadAtOffSetAsync is using direct Thread API
like submitting a request to some other separate IO threads that may or may not be implemented in C#
or giving a kernel or some unmanaged library a callback that will queue the execution of your task's continuation on threadpool when the operation is done
which may do anything internally, the kernel or some library
that's just a start of the operation
you have to use Rider (which has nice decompiler) or clone dotnet/runtime and browse through an implementation to get the idea how it works (the latter option is better)
which does very roughly what was said above (I may be wrong, because I did literally browse through the code to understand how it works more or less)
I am reading the source now at
https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/SafeHandles/SafeFileHandle.ThreadPoolValueTaskSource.cs,74b2b442e61a54df,references
and it comes down to
That's the IO ReadAllBytes going to SafeFileHandles' methods
QueueRead and QueueToThreadPool
So the further "IO vs worker" thread distinction needs to happen inside the ThreadPool's private code somehow.
File APi -> SafeHandle Api -> ThreadPool Api
or am I mistaken?
clone dotnet/runtime
the implementations is different between Unix and Windows
okay, no way around that I guess :))
you can browse on github too, personally, source.dot.net annoys the shit out of me because you don't navigate files and can't see platform-specific code in a sane way
GitHub
runtime/src/libraries/System.Private.CoreLib/src/Microsoft/Win32/Sa...
.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps. - dotnet/runtime
but you can still navigate...not sure I get the navigation right on github
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/IO also RandomAccess bits
GitHub
runtime/src/libraries/System.Private.CoreLib/src/System/IO at main ...
.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps. - dotnet/runtime
I'm surprised this implementation of IThreadPoolWorkItem looks like it's performing a synchronous read on Unix
actually, you can navigate...thanks for the link.
(technically speaking, depending on the amount of concurrent long-waiting read operations this should not be an issue - hill-climbing algorithm will take care of that by scaling up the thread count)
(which means it does not use epoll for file api, my mistake, I was remembering it being mentioned in one of the devblogs, maybe it was just an ASP.NET Core one haha)
yeah, I remember the same blog maybe :))
I guess I'd need to read this:
https://github.com/dotnet/runtime/blob/6b8d34b9954fabf311594a0ac511a917947f1c92/src/libraries/System.Private.CoreLib/src/System/IO/RandomAccess.Windows.cs#L272
yes, this should be doing
"BeginInvoke" (anyone remembers?:)) create an overlapped IO request + assign a completion callback -> call the kernel API -> kernel API returns immediately after issuing a read -> yield back to threadpool
"EndInvoke" kernel calls the completion callback which submits a new work items to the threadpool, which would resume the execution of your task
the old APM meme hopefully helps with mental model lol
in which case, there will be no blocked or any thread waiting for IO or doing IO work - it is handled inside the kernel in an opaque way, the heap simply has your state machine object of a task that waits for something to call a callback that will continue its execution
(I assume overlapped IO interop creates a pinned handle for the callback, which then unpins it when it returns)
I realized I am 5% as knowledgeable as u ๐
thanks a lot, but yeah...this should be what'S going on
so in this case we circumvent IO threads altogether?
not really, the difference is probably only in this specific area and can be measured in around 3 hours :kekw:
most likely kernel has its own IO threadpool
so back to square 1 for me at least hah ๐
or whatever, after all there is also a concern of a driver implementation and whatnot
I found some DequeueIO methods in another namespace...wonder how they are called
like, nothing stops you from having a magic userspace implementation that let's say copies data from NVME to a GPU using some low level controller API that bypasses CPU completely
and then just raises a hardware interrupt that will call an interrupt handler
on some core
which then may even directly schedule a continuation on the threadpool
(but interrupt handlers are very limited in what they can do so likely waking up some special handler thread first that will do the enqueueing)
(this is also only in the case of Windows, Unix/Linux has its own set of APIs to do reads (which works on file descriptors, regardless if it's a file or a socket or anything), then epoll and now also io-uring, all of which can be made to work with async with various degrees of efficiency)
yeah looking for the implementation of the OverlappedValueTaskSource, somehow can't find it in source
to see the difference ...but might need to wait a bit for more brain power
git clone https://github.com/dotnet/runtime -o dotnet-runtime
๐คทโโ๏ธGitHub
runtime/docs/workflow/editing-and-debugging.md at main ยท dotnet/run...
.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps. - dotnet/runtime
and then you just open in Visual Studio
I know, I know....just don't want to :))
but I might have to
why? it's easier, I keep a copy around, it's also useful to examine codegen for arm64 because disasmo needs a built checked variant of runtime for that
it's on the better end of a spectrum of "how easy it is to build a runtime for a particular language"
I've got an older laptop and Visual Studio is crazy
docs may take a bit to find but it's okay otherwise
but I think I'll do the setup on a Remote where I dont care
just strange that nobody has really done the digging and explaining in a blog post
I am not that familiar with .NET threading tbh, just picked up the "Pro Asynchronous Programming in .NET" and trying to follow closely
there probably is some information on devblogs
ChatGPT no help...too
blog posts exist for that it's just they were written some time ago
maybe I'll just leave the topic as a black box for the moment
since the implementation isn't super new
(there isn't that much to change unless a rewrite)
can u point me to some of those?
like I find a lot of useful information in older 2001 books for the .NET 1.0
nothing too too much has changed
wrt to core stuff
Adam Sitnik
.NET Blog
File IO improvements in .NET 6 - .NET Blog
Learn about high-performance file IO features in NET 6, like concurrent reads and writes, scatter/gather IO and many more.
yes it is a better idea to look at code or make sure the material references at least .NET Core 3.1 or newer
obv, I always experiment
but even then - .NET 6+ has switched to PortableThreadPool implementation written in pure C#
only to then again reintroduce WindowsThreadPool which is opt in except NativeAOT targets when you build for Windows
so that's why the class is named like this...before it resided more in the CLR?
(but you probably shouldn't care too much - it's an internal abstraction that for all intents and purposes acts the same)
it may be useful to not bother with old terminology
runtime is the clr
like the CLR :))
well, coreclr
yeah, runtime it is
today people may refer to it to as just runtime or clr/coreclr to emphasize the difference with mono flavours (be it the one that too lives in dotnet/runtime or the custom fork used by unity)
I really hope somebody updates CLR via C# for modern .NET ... so botr is not that but advertised as that
plus clr vs mono - these control the compiler being used and the GC being used
got it
because both would use the same PortableThreadPool implementation on a given platform
so you can't probably say that PortableThreadPool is a definitive part of "CLR"
either way you should let the old idea of CLR how it was envisioned during .NET Framework days die
because the only thing that matters is what we have today and the way it works
what I meant more that it was part of the runtime in native code, but now it's managed seems like
trying to make sense of the "Portable" in the name
maybe I am wrong
mostly that and also making implementation unified across windows and unix
there is no clear split of "managed bits are C# and strictly non-runtime" vs "unmanaged bits are strictly runtime"
some facilities that enable .NET to work as a VM are written in C++ like GC and JIT
some are split like type system facilities
where parts are in C++ and parts are in C#
or odd things like threadlocals which also both tied in to platform-specific parts written in C++ but also have C# implementation details
old books may provide misleading mental model
I know, but when I am looking at BCL code
there is a lot of calls into unmanaged code ...
most calls in CoreLib for things like reading files are just calling kernel api
is all unmanaged code in one file or is it split? must be split
depends on organization in a particular folder/project
I mean the "runtime"
personally I have a disdain towards superstition-like approach to things, it's not productive
there is no magic - it is all implementation details
i.e. C:\Program Files\dotnet\shared\Microsoft.NETCore.App\6.0.13
the bundling is completely orthogonal to organization in runtime
well, it is usually organized by assemblies
but you are trying to tackle on too many unrelated topics at once
anyway I explained what I knew here, if you want to know more - browse code ๐คทโโ๏ธ
yeah it's offtopic ๐ just wondered
thanks for the help
been doing that a lot, need to continue
thanks again to @abyssptr for the great pointers
I think anybody interested in the original question needs to read this file from the source code (Windows case) and the IOCompletionPoller class in particular which is what the IO Thread really is.
"// Poller threads are typically expected to be few in number and have to compete for time slices with all
// other threads that are scheduled to run. They do only a small amount of work and don't run any user code."
(this behavior can be changed and they can run continuation code if UnsafeInlineIOCompletionCallbacks is set to true via en variable...but not typically the case)
https://github.com/dotnet/runtime/blob/0a5418d7e7ae7f7653a32004c808881af5275469/src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.IO.Windows.cs#L141
GitHub
runtime/src/libraries/System.Private.CoreLib/src/System/Threading/P...
.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps. - dotnet/runtime