Marvee Amasi
DIIDevHeads IoT Integration Server
•Created by Marvee Amasi on 7/31/2024 in #code-review
How can I optimize matrix multiplication performance and reduce L3 cache misses in my C++ library?
Thanks guys, my matrix operation suffered from poor cache locality, was accessing elements of the matrices in a scattered manner, so I didn't do something complex to fix bottleneck 👌, but it fixed it.
Dividing the matrices into smaller blocks to improve cache locality using blocking function, now more data is likely to be found in the processor's cache when needed, reducing the number of costly memory accesses
10 replies