C#•14mo ago

Why is my SIMD code slower than the scalar version?

I wrote the following to learn more about simd - it tries to find a substring https://paste.mod.gg/jswpvcpgoxgo/0 I ran a benchmark on my machine, which is avx512, compared against when it goes down the scalar path by setting DOTNET_EnableHWIntrinsic=0. In my benchmark I have 2 paragraphs of Lorem Ipsum (1156 chars length) and a search string of a few words (47 chars length). The vector512 benchmark takes approx 2.8us and the scalar benchmark takes 4.1us which seems like a fairly large difference and indicative that I’ve done something wrong. Is there any more profiling I can use to work out what went wrong?

BlazeBin - jswpvcpgoxgo

A tool for sharing your source code with the world!

3 Replies

dreadfullydistinctOP•14mo ago

On closer inspection of the runtime guide I need to - use load rather than create in the loop - install vtune or something to work out what’s going wrong in terms of the instructions

reflectronic•14mo ago

The vector512 benchmark takes approx 2.8us and the scalar benchmark takes 4.1us which seems like a fairly large difference and indicative that I’ve done something wrong.

i don't understand. this means that the vector512 benchmark is faster. it takes fewer microseconds

dreadfullydistinctOP•14mo ago

I muddled those around sorry. Scalar is 2.8 and vector 4.1 I was mucking about on my work computer where I don’t have discord otherwise I’d have pasted the table

Gaming

Programming

Why is my SIMD code slower than the scalar version?

Did you find this page helpful?