C
C#16mo ago
DKMK100

✅ ComputeSharp and activation functions

I'm working on making a neural net that I'm running on the GPU with the help of a library called ComputeSharp, and I want to use only pre-compiled shaders, but I also wanna have multiple activation function options so each layer can use a different activation function. The thing is, I have no idea how I can do this without manually writting a bunch of different versions of each shader, for use with each activation function. I can't pass the activation function as a parameter since that would mean having to compile the shaders at runtime, and I'm hoping to avoid that. Is there an alternative I can use, possibly something akin to c++ templates? Anyone here familiar with the library or know a solution I can use?
19 Replies
Aaron
Aaron16mo ago
@Sergio
Sergio
Sergio16mo ago
Ooh cool ahah 😄
"Anyone here familiar with the library"
I wrote the library, you could say I'm somewhat familiar when
"I can't pass the activation function as a parameter since that would mean having to compile the shaders at runtime, and I'm hoping to avoid that."
@DKMK100 No, there's no way to do that. I've added the metaprogramming support (capturing delegates in a shader) pretty much to support exactly this scenario (activation functions in neural networks), but that has the tradeoff that you can no longer precompile shaders. That said, are you sure this is actually an issue? Ie. precompiling shaders is still relatively fast, and especially if you're training a neural network, which takes hours and hours, that's pretty much negligible 🙂 I could in theory add some way to precompile combinations, but it's not implemented currently
DKMK100
DKMK10016mo ago
hmm, that's a shame, I was hoping there'd be an equivalent to c++ templates where I could have it precompile a version for each activation function
Sergio
Sergio16mo ago
Is there a particular reason you wanted to precompile everything? I mean yes the extra startup speed is nice, but is it impactful in this context? And have you benchmarked how long it takes to compile the shader anyway?
DKMK100
DKMK10016mo ago
I'm mostly worried if it's runtime compiled it might not be as high quality, this line is concerning: "Additionally, precompiling shaders makes the code less error prone" but maybe I'm interpreting that wrong the way the wiki on github is written sort of implies precompiled shaders will be better when running on the GPU and obviously if I'm running it that many times I want it to have any compile-time optimizations possible is that actually a concern? or are they compiled the same way in both cases?
Sergio
Sergio16mo ago
No, the code quality is identical. It's exactly the same compiler being used 🙂
DKMK100
DKMK10016mo ago
alright, guess I'll use runtime compilation then do I need to pass the shader method delegate directly into the constructor of the compute shader, or can I save that somewhere else (eg. the neural net layer) and pass that into the constructor of the compute shader? Not sure how you keep track of valid shader methods
Sergio
Sergio16mo ago
You can set the delegate to the field whenever you want, the only thing that matters is that it's there when you try to run the shader. So you can either set it on construction, or assign the field later on 🙂 Btw have you seen the sample in the benchmark project? There's a fully connected neural network minimal example there (with no activation, but easy enough to add)
DKMK100
DKMK10016mo ago
I have not, no
Sergio
Sergio16mo ago
See: https://github.com/Sergio0694/ComputeSharp/tree/main/samples/ComputeSharp.Benchmark/Blas Also @DKMK100 in case it's useful for reference: https://github.com/Sergio0694/NeuralNetwork.NET I've implemented fully connected and convolutional neural networks from scratch in C# there Including a lot of activation functions and cost functions, all with gradients as well Could be useful to eg. port them to shaders, if you're not already 100% familiar with the math Because deriving all the formulas from scratch is a fair bit of work 😄 Anyway, hope that helps!
DKMK100
DKMK10016mo ago
that helps a lot, thank you so much! erm, how does allocating a new texture2D work? indexing is backwards on a texture2D on GPU vs the array its copied from, so when saying i want one of a certain size, which size goes first?
Sergio
Sergio16mo ago
You can read the parameter names 😄 Textures are in row major order, so the order matches that It's width, height
DKMK100
DKMK10016mo ago
how can I make a list of a bunch of 2D textures that are larger than 2048? do I flatten one of the dimensions and use a very cursed single 2D texture
Sergio
Sergio16mo ago
What do you mean? The maximum size per axis in a 2D texture is 16.384 You should be perfectly able to just have a 2D texture "larger than 2048"
DKMK100
DKMK10016mo ago
yea but how do I have a lot of them that I access in sequence? a 3d texture can only be 2048 by 2048 by 2048 so I can't use on of those but nothing else approximates a list... if I just have an array of them outside the GPU, won't that slow me down significantly? I guess I can test later today, but sounds pretty slow
Mifury
Mifury16mo ago
It sounds like you're looking for a way to use multiple activation functions with pre-compiled shaders, without having to manually write a separate version of each shader for each activation function. One approach you could consider is using template metaprogramming techniques. Template metaprogramming is a technique used in C++ to generate code at compile-time based on template parameters. The idea is to write code that is generic enough to work with multiple types, and then instantiate the template with the specific type parameters at compile-time. This can result in more efficient code since it avoids the need for runtime polymorphism, and it can also simplify the code by eliminating the need for explicit type checks.
Sergio
Sergio16mo ago
ComputeSharp already supports this, it's just that there's no support to also precompile the combinations at compile time to a shader, they just get transpiled to HLSL. I'd need to add more annotations specifically for this, but it hasn't been a priority 🙂 Ah I get it, is this for like a 3D tensor, say, a 4k frame with 3 channels for RGB? You might want to just use a structured buffer for this and manually index Textures are mostly used when you need to interpolate, but you don't need that here So there's not much benefit anyway from using them
DKMK100
DKMK10016mo ago
Update: I'm stupid and was doing something dumb I need 3 2d operations to replace my one 3d operation Which will solve my problem
Accord
Accord16mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.