If we talk about performance code, why is MLIR better than LLVM?
In their latest blog they say that MLIR is for next generation copiers and that Mojo is the only language that takes full advantage of it. But in terms of performance I still don't understand what they mean.
The only advantage that I understand is that Mojo, using MLIR, can give instructions to the GPU, TPU, etc.
10 Replies
As I understand, MLIR should be similar to LLVM in normal use cases. However, MLIR allows for optimization for certain classes of problems which can't be done using LLVM, think AI compute graphs. Previously people would write custom Compilers for those kinds of problems (ex: XLA). MLIR allows those Compiler techniques to be incorporated in an interoperable Compiler framework so any language can make use of those out of the box without the need for DSLs and binding inside the language. Coupled with the ability to efficiently generate code for both CPU and GPU makes certain classes of problems which either depend on high level abstractions or use heterogeneous execution much faster while maintaining the performance of LLVM otherwise.
Are these optimizations that can be achieved with MLIR and not with LLVM very niche for AI or are they very common optimizations? That is, would Rust or Swift be encouraged to use MLIR in the future with all the work that entails to obtain the benefits of MLIR?
I just want to have a little context of what kind of optimizations they are 😅
I'm not sure about what specific optimizations Mojo does (besides making all integers/floats SIMD dataypes), but I can speak to MLIR broadly. An example given in the MLIR docs (https://mlir.llvm.org/docs/Tutorials/Toy/Ch-3/) gives the following function:
and shows that a code (AST) transformation can be trivially defined for when
transpose
is called on transpose
, such that the above can simply become:
This kind of optimization would be impossible in practice to achieve with LLVM, which only deals with very low-level operations.
I'm not sure I'd classify these kinds of optimizations as either niche or common. You can imagine these kinds of optimizations being applied all over the place outside of just linear-algebraic operations. But they're also not "standard" optimizations like loop unrolling or the like, which LLVM can perform (though arguably it's more straightforward in MLIR).
I would say MLIR is analogous to an "ubercompiler", and that yes, there's a case to be made that all languages should use it going forward.
That said, it's a considerable amount of work for an existing language to port over to MLIR, because they've already set up a pipeline of
language-specific IR -> LLVM -> machine code
and they'd need
language-specific IR within MLIR -> LLVM + others -> machine code
Essentially, whatever language-specific optimizations are being performed within a language's existing compiler would need to be ported to MLIR, which is extremely non-trivial. But for new languages there is only benefit, as MLIR provides a framework for performing language-specific optimizations (e.g. transpose(transpose(x))
-> x
).This was what I wanted to read! I think MLIR will be a before and after. Languages prior to MLIR and those that use MLIR. I would love to live many years and see what will happen.
Awesome! I feel the same way!
Starting at 13:18 mojo devs discuss the mojo compiler including how they use mlir and llvm
LLVM
YouTube
2023 LLVM Dev Mtg - Mojo 🔥: A system programming language for heter...
2023 LLVM Developers' Meeting
https://llvm.org/devmtg/2023-10
------
Mojo 🔥: A system programming language for heterogenous computing
Speaker: Abdul Dakkak, Chris Lattner, Jeff Niu
------
Slides: https://llvm.org/devmtg/2023-10/slides/keynote/Mojo.pdf
-----
This talk will give an overview of Mojo 🔥, a new programming language in the Python fami...
Let's make one thing clear first: LLVM and MLIR are actually technology that complement each other.
LLVM IR is too low level for many optimisations, so languages introduce their high/mid level IR before lowering to LLVM IR. Swift has SIL and Rust HIL/MIR. MLIR is generalisation of this high level IR idea, and dial it up to 42: why having only one layer between AST and LLVM IR when one could have any number? Why don't we interleave those layers/lowering/passes? Why do we lower everything to everything else all at once (e.g.
HIR
-> MIR
)? So people invented dialect, which is a group of semantically linked operations which should be modelled together. People can have dialects that are high level like the ones used in XLA, or low level like the LLVM dialect. The compiler can choose to operate on any or all of them, making the whole system a lot more extensible and versatile. Using MLIR can make the architecture of the Mojo compiler in a way very lean and not ad-hoc, and can iterate very fast, as we can see already from the Mojo release circle.It is much clearer to me after reading this. Thank you so much!
All I found is this:
>...MLIR shares similarities with traditional CFG-based three-address SSA representations (including LLVM IR or SIL), but it also introduces notions from the polyhedral domain as first class concepts. The notion of dialects is a core concept of MLIR extensibility, allowing multiple levels in a single representation. MLIR supports the continuous lowering from dataflow graphs to high-performance target specific code through partial specialization between dialects.
>MLIR supports multiple front- and back-ends and uses LLVM IR as one of its primary code generation targets. MLIR also relies heavily on design principles and practices developed by the LLVM community. For example, it depends on LLVM APIs and programming idioms to minimize IR size and maximize optimization efficiency.[...] [MLIR] is a brand new IR, both more restrictive and more general than LLVM IR in different aspects of its design. We believe that the LLVM community will find in MLIR a useful tool for developing new compilers, especially in machine learning and other high-performance domains.