ke7c2mi
ke7c2mi
DIIDevHeads IoT Integration Server
Created by Marvee Amasi on 10/18/2024 in #code-review
Optimizing memcpy Performance on Intel Core i7 10700K: SIMD and Compiler Flags
Could memcpy be optimized to exceed this expectation, possibly using SIMD or other CPU specific features? Yes, and it will usually be very well optimised THis is probably the memcpy you are running: https://elixir.bootlin.com/linux/v5.10/source/arch/x86/lib/memcpy_32.c This is probably the one it actually uses: https://elixir.bootlin.com/linux/v5.10/source/arch/x86/lib/mmx_32.c#L29 I think the general principle to understand is that any time a CPU grabs something from memory / cache it is doing a transaction on some bus. 1 8 byte transaction will have less overhead than 8 1 byte transactions The game pretty much becomes what are the biggest chunks of data we can move at a time. Here we see MMX registers used, in Arm we will expect to find the instruction which load/store multiple regs used - in both these cases, more data for less bus transactions - faster 🤓
6 replies