Why lambda is faster than other approaches?
I've been working on some algorithm and discovered that if the
Sort
function uses a lambda comparer, it is faster than if it uses a comparer type. So I run this test with BenchmarkDotNet and the result really puzzles me. Why is that?
https://gist.github.com/laicasaane/d51193a0a4aff7c6df6c1bb89a66bdc9Gist
Test Sort with lambda, local function, comparer type
Test Sort with lambda, local function, comparer type - Result.log
5 Replies
i know, if you change the
Test_LocalFunc
to
you'll see comparable time to Test_Lambda
IIRC, deletages which point to instance methods are quicker to call than delegates which point to static methods
Something to do with a static delegate invocation needing to go via a trampoline?
(the lambda gets turned into an instance method, even though it's marked
static
: https://sharplab.io/#v2:EYLgtghglgdgNAExAagD4AEBMBGAsAKHQGYACLEgYQG8CS6zT0AWEgFQFMBnAFwH0AZCGGAIIACljcA2gF0So7hACUtejXz1N8iIoB0AQU4BlAA4QYYpbqMB7AE7cx6bADYSYgB5wSATyUkAXgA+Eg9dChswMzt2VhsxPyUAblU6AF8CVIYyFg4eARsAYwgAG3QAVgkYaTkFZSz1LXo6g2MzCytbBzEIqIgY5MyNJrJXEklKSOj2Ku5Q7wm/QJCw3um4hMHh9II0oA==)Making your comparer a struct instead of a class isn't helpful, and is actually hurting things. Your comparer benchmark has an unnecessary boxing operation and allocation on every iteration right now. Change it to a sealed class and use the typical singleton comparer pattern instead.
Besides that, delegate calls have always been faster than interface calls. The heuristic I use is that a virtual interface call has about twice the overhead of a virtual delegate call or virtual class call, so that is something to consider. Net8+ kind of complicates things in terms of benchmarks reflecting actual real-world performance and figuring out exactly "why" X is benching faster than Y with all the dynamic PGO devirtualization stuff going on as well. There's a lot of JIT magic happening these days that optimizes stuff.
You can add the memory diagnoser to BDN to see allocations.
Well this sounds things are getting complicated once Unity completes their transition to CoreCLR which essentially enables using .NET 8+ in Unity.
All the advice above applies regardless. It might just be harder to gauge how much the PGO optimizations actually apply in typical real-world usage of the code.
i.e. in real world code you might have more than one implementation of the comparer interface/delegate fighting for dominance in which case the devirt PGO optimizations won't be as effective as they are in your benchmarks where one implementation of the comparer clearly dominates in frequency of calls and thus is the clear choice for prioritizing for devirt for PGO