Simd size greater than max supported by processor
What happens if I create a SIMD value with a large size - something like 2^10 or larger ?
Will there be multiple instruction with the max size permitted by the processor ?
Wondering what is the best practice when working with large arrays but still want SIMD behaviour .
3 Replies
It will be very slow. The general practice when you want something like a simd, but with arbitrary size, is to make something like a heap allocated
struct Vector[type: DType]
with arithmetic semantics. You can use vectorize to implement the operations.vectorize | Modular Docs
vectorizefunc Int, /, *, unrollfactor Int)
Or, you can use one of the libraries that already implements a similar type.
If I take the following function and compile it with
T = DType.float32
and width = 256
, I get the following assembly on my AVX-512 processor:
I'm actually not sure why this is using the stack, I'm going to start up a discussion in #performance-and-benchmarks because that feels like a bug.