How to understand Mojo's compiler optimization capabilities?
This is something that’s been bugging me for a while. I am afraid there’s no clear answer, but I’m curious how you guys handle it.
How can I figure out what optimizations the compiler is doing anyway instead of implementing them myself (vectorize etc) ?
Implementing them myself is often easy with Mojo, but still it’s prone to errors, makes the code harder to read, and things like choosing simd_width in the code might be less optimal than letting the compiler decide based on the machine the code is running on.
To clarify what i mean with the last point, it seems that on Apple Sillicon is the best choice but on other machines maybe Who knows? I hope the compiler does depending on the machine it is compiling for. It feels a bit insane to actually hardcode this factor. 😉 Thanks
To clarify what i mean with the last point, it seems that on Apple Sillicon is the best choice but on other machines maybe Who knows? I hope the compiler does depending on the machine it is compiling for. It feels a bit insane to actually hardcode this factor. 😉 Thanks
6 Replies
TLDR for the below is that compiler optimizations only get you so far, autotuning is meant to solve this issue but it's being redesigned after many compiler changes.
Full answer you might be interested in, I had a similar line of questioning internally:
question:
answer from one of the kernel engineers:
Then I was asking about if
simdwidthof
should return a platform-specific simd_width_of_type * simd_operations_per_cycle
and the answer was:
My question:
answer:
And someone else added:
Getting the absolute max performance is with SIMD and parallel operations is complicated, and isn't something that can easily be done by compiler optimizations like auto vectorization, it only gets you so far. I've been trying to find a way to express this for a blog post in a simple way with real-world examples, but haven't completed it yet.-Thanks a lot for sharing this internal conversation, very helpful to sense how complicated things actually are. :mojo:
@Jack Clayton I think explaining which optimizations are guaranteed by the Mojo compiler vs. which ones aren't (with explanations for why) would also be a good idea for the article.
All we have right now is, "The Mojo compiler isn't magic" and "Mojo provides high-level, zero-cost abstractions" which would sound like contradictory statements to people who don't have a lot of knowledge on compiler optimizations.
Working off of this question, is there any chance of compiler explorer support in the near future or at least once the compiler goes open source? Either the main godbolt.org (Matt has no issue with compilers he has no source access to or experimental compilers, MSVC, Carbon, and some other experimental C++ successors are in there), one hosted by modular (so that nightly can be kept up to date more easily) or one we can self-host?
It tends to be my tool of choice when answering these questions before I head off to uops.info.
Hi @Darkmatter thanks for bringing this up, discussing internally if we can make this happen
Thanks for the consideration. Being able to use it to help explain why some things are faster in mojo would help in #performance-and-benchmarks.