Modular•8mo ago

Matrix Multiplication (matmul): `numpy` hard to beat? even by mojo?

Super interested in mojo and wanted to try out some of the documentation/blog examples. 🤓 🔥 https://docs.modular.com/mojo/notebooks/Matmul Great explanations and the step by step speed improvements are amazing to see! 👍 However, in the end a comparison to a real world alternative is interesting. No one would seriously do matmul in pure python 😆 . So I compared the performance to numpy which is a much better "baseline" for comparison. Results on my machine: - Naive matrix multiplication - 0.854 GFLOP/s - Vectorized matrix multiplication without vectorize - 5.71 GFLOP/s - Vectorized matrix multiplication with vectorize - 5.81 GFLOP/s - Parallelized matrix multiplication - 35.2 GFLOP/s - Tiled parallelized matrix multiplication - 36.8 GFLOP/s - Unrolled tiled parallelized matrix multiplication - 35.3 GFLOP/s - Numpy matrix multiplication - 134.2 GFLOP/s Results - gigantic speedup comparing against naive, pure python 🔥 - still almost 4x SLOWER compared to numpy 😕 Wondering if numpy is so heavily optimised for this operation that there is little way to keep up or improve upon? Does anyone have ideas for further optimisations to get mojo closer to numpy? Is this something that only a framework like MAX or super low level bit manipulation can achive? 🤔

Matrix multiplication in Mojo | Modular Docs

Learn how to leverage Mojo's various functions to write a high-performance matmul.

2 Replies

taalhaataahir01022001•8mo ago

https://www.linkedin.com/posts/pavanmv_benchmarking-mojo-the-supercharged-superset-activity-7220047088577888256-siYI/ You can use MAX Engine for speedy matmul. There's a whole discussing here as well: https://github.com/modularml/mojo/issues/2660

Pavan MV on LinkedIn: Benchmarking Mojo 🔥: The Supercharged Superse...

Benchmarking Mojo 🔥: The Supercharged Superset of Python For the past few months, I'd been hearing about Mojo—a new programming language touted as a superset…

GitHub

Slower Matrix multiplication than numpy · Issue #2660 · modularml/m...

Bug description I've tried running the Mojo matmul file available in the repository inside examples directory (https://github.com/modularml/mojo/blob/main/examples/matmul.mojo) The output of th...

Darin Simmons•8mo ago

While it is true that numpy is heavily optimized, it's also true that numpy has its own engine. Decades of work have occurred on the engines, BLAS, LAPACK, OpenBLAS, and the new kid IntelMKL. The typical numpy installation uses OpenBLAS. Intel's MKL numpy build is even faster than OpenBLAS. numpy BLAS info The community is attacking this issue in the inaugural #mojo-marathons . While Ethan Darkmatter and others are making strides in pure mojo, the solution has not been found yet. Additionally, other members of the community have been making computing libraries #community-showcase . Mojo is built on MLIR which is different from the IR than C rubs on; and the Modular team has been writing their own MLIR. The MAX engine is also in development. There are many areas for optimization still left to try.

Gaming

Programming

Matrix Multiplication (matmul): `numpy` hard to beat? even by mojo?

Did you find this page helpful?