InsertPi Posts - Answer Overflow

InsertPi

•Created by InsertPi on 11/11/2023 in #help

✅ Use Roslyn to auto-generate overloads of method

I'm working on some code and the amount of overloads I'm having to write is very quickly getting out of hand. I have a bunch of related functions which are overloaded based on the number of parameters in a provided Action. For example, one of the longer methods looks like:

        internal void DispatchKernel<T, U, V, W, X, Y, Z, A>(int start, int end, Buffer<T> buf1, Buffer<U> buf2, Buffer<V> buf3, Buffer<W> buf4, Buffer<X> buf5, Buffer<Y> buf6, Buffer<Z> buf7, Buffer<A> buf8, Action<Index, GPUArray<T>, GPUArray<U>, GPUArray<V>, GPUArray<W>, GPUArray<X>, GPUArray<Y>, GPUArray<Z>, GPUArray<A>> action, string src)
            where T : unmanaged
            where U : unmanaged
            where V : unmanaged
            where W : unmanaged
            where X : unmanaged
            where Y : unmanaged
            where Z : unmanaged
            where A : unmanaged
        {
            var idx = new Index(start);

            var kernel = GetKernel(action, src);

            kernel(((end - start) / block_size, block_size), idx,
                new GPUArray<T>(buf1),
                new GPUArray<U>(buf2),
                new GPUArray<V>(buf3),
                new GPUArray<W>(buf4),
                new GPUArray<X>(buf5),
                new GPUArray<Y>(buf6),
                new GPUArray<Z>(buf7),
                new GPUArray<A>(buf8));

            Synchronize();
        }

        internal void DispatchKernel<T, U, V, W, X, Y, Z, A>(int start, int end, Buffer<T> buf1, Buffer<U> buf2, Buffer<V> buf3, Buffer<W> buf4, Buffer<X> buf5, Buffer<Y> buf6, Buffer<Z> buf7, Buffer<A> buf8, Action<Index, GPUArray<T>, GPUArray<U>, GPUArray<V>, GPUArray<W>, GPUArray<X>, GPUArray<Y>, GPUArray<Z>, GPUArray<A>> action, string src)
            where T : unmanaged
            where U : unmanaged
            where V : unmanaged
            where W : unmanaged
            where X : unmanaged
            where Y : unmanaged
            where Z : unmanaged
            where A : unmanaged
        {
            var idx = new Index(start);

            var kernel = GetKernel(action, src);

            kernel(((end - start) / block_size, block_size), idx,
                new GPUArray<T>(buf1),
                new GPUArray<U>(buf2),
                new GPUArray<V>(buf3),
                new GPUArray<W>(buf4),
                new GPUArray<X>(buf5),
                new GPUArray<Y>(buf6),
                new GPUArray<Z>(buf7),
                new GPUArray<A>(buf8));

            Synchronize();
        }

I came across a blog post here about source generators in .NET 7, and was curious if I could use similar techniques here to totally omit the need to hand-write all these overloads and leave it to a source generator. However, I've never used anything related to source generators or Roslyn before and would like some guidance. Any help would be appreciated. (Tagging both intermediate and advanced since I'm not sure what this is considered-- someone let me know)

7 replies

CC#

•Created by InsertPi on 10/28/2023 in #help

✅ Replace enum with class name

I'm working on an API right now which, for those familiar with OpenMP, is a parallel programming API which aims to implement OpenMP functionality as faithfully as possible. With that said, the current API has the following syntax for declaring a parallel-for loop:

DotMP.Parallel.ParallelFor(
    int start,
    int end,
    DotMP.Schedule schedule = DotMP.Schedule.Static, //enum
    uint? chunk_size = null, //defined by runtime if unset
    uint? num_threads = null, //defined by runtime if unset
    Action<int> action);

DotMP.Parallel.ParallelFor(
    int start,
    int end,
    DotMP.Schedule schedule = DotMP.Schedule.Static, //enum
    uint? chunk_size = null, //defined by runtime if unset
    uint? num_threads = null, //defined by runtime if unset
    Action<int> action);

I am looking at the possibility of implementing custom schedulers as defined by the user through some sort of consistent interface. I am not looking for API/ABI compatibility, but source code compatibility would be highly desired. What I would like is to be able to have the DotMP.Schedule enum be replaced with classes that implement the IScheduler interface. How would I incorporate this into the type signature to be source-compatible? For instance:

DotMP.Parallel.ParallelFor(0, N, schedule: DotMP.Schedule.Dynamic /* this is a class name */, action: i =>
{
    y[i] += a * x[i];
});

DotMP.Parallel.ParallelFor(0, N, schedule: DotMP.Schedule.Dynamic /* this is a class name */, action: i =>
{
    y[i] += a * x[i];
});

(Before you ask: yes, I know about the TPL. Yes, I have my reasons for not using it. No, I don't want to get into them in this thread, I want to focus on the issue at hand. The provided function is extremely similar to System.Threading.Tasks.Parallel.For on the surface, but it is implemented significantly different, and there's much more to my library besides this that the TPL is not flexible enough to do.)

10 replies

CC#

•Created by InsertPi on 10/14/2023 in #help

✅ Modify delegate for ILGPU acceptance

This is a long one, strap in! So I’m incorporating ILGPU into a library I’m writing for very simple but powerful/expressive/flexible parallel programming. ILGPU is gonna help power the upcoming GPU API. With that said, ILGPU accepts either a delegate or a static method as a GPU kernel. The method called has to incorporate an instantiation of the ILGPU.IIndex interface as the first parameter. For the sake of my API, I would like to avoid passing around ILGPU datatypes to delegates I accept, because that exposes implementation details to users of my library. With that the background out of the way, ILGPU kernels, if they are a delegate, do not support captures or objects. This poses a problem for me, since I need to pass a delegate to the ILGPU library with an IIndex but abstract that away in my API. This problem exists with other parameters, too. What I want to be able to do is have something like:

delegate<Index1D, ArrayView<T>, void> kernel = (idx, av) =>
{
    int i = idx.X; //simple example
    var g = new GPUArray(av);
    action_received_from_user(i, g);
}

delegate<Index1D, ArrayView<T>, void> kernel = (idx, av) =>
{
    int i = idx.X; //simple example
    var g = new GPUArray(av);
    action_received_from_user(i, g);
}

but the problem here is that the kernel delegate captures action_received_from_user, which ILGPU doesn’t allow. I tried passing in a struct as one of the kernel arguments that contains action_received_from_user as a member, but because delegates are managed types, they cannot be passed in as a kernel argument. What I am thinking of is one of the following: 1. A way to on-the-fly inline (inline in the C/C++ sense) action_received_from_user into my kernel body. 2. A way to use Mono.Cecil or some other reflection API to manually inject CIL into the method body and change the argument types (I’m very familiar with CIL, less so with reflection) 3. Use Roslyn -> ??? -> Profit…? 4. Some other way to accomplish this. Phew, that was long. I’ve been trying to think of solutions for days and I have tried just about everything under the sun, but none of them have worked. Any ideas?

22 replies

CC#

•Created by InsertPi on 10/11/2023 in #help

✅ Collecting hotspot data on Linux in a threaded .NET environment

I'm working on optimizing some parallel code, and would like to be able to profile the code and collect hotspot data so I can determine how much time is spent doing useful work vs. how much time is spent as overhead managing work-sharing across the threadpool I create. I looked at BenchmarkDotNet, but the EtwProfiler is only available on Windows. I've been developing this whole project on Linux, and would prefer to benchmark on Linux as well due to the low OS overhead/idle CPU usage. What is the best way to do this?

6 replies

Gaming

Programming