M
Modularβ€’5mo ago
JulianJS

Matrix Multiplication (matmul): `numpy` hard to beat? even by mojo?

Super interested in mojo and wanted to try out some of the documentation/blog examples. πŸ€“ πŸ”₯ https://docs.modular.com/mojo/notebooks/Matmul Great explanations and the step by step speed improvements are amazing to see! πŸ‘ However, in the end a comparison to a real world alternative is interesting. No one would seriously do matmul in pure python πŸ˜† . So I compared the performance to numpy which is a much better "baseline" for comparison. Results on my machine: - Naive matrix multiplication - 0.854 GFLOP/s - Vectorized matrix multiplication without vectorize - 5.71 GFLOP/s - Vectorized matrix multiplication with vectorize - 5.81 GFLOP/s - Parallelized matrix multiplication - 35.2 GFLOP/s - Tiled parallelized matrix multiplication - 36.8 GFLOP/s - Unrolled tiled parallelized matrix multiplication - 35.3 GFLOP/s - Numpy matrix multiplication - 134.2 GFLOP/s Results - gigantic speedup comparing against naive, pure python πŸ”₯ - still almost 4x SLOWER compared to numpy πŸ˜• Wondering if numpy is so heavily optimised for this operation that there is little way to keep up or improve upon? Does anyone have ideas for further optimisations to get mojo closer to numpy? Is this something that only a framework like MAX or super low level bit manipulation can achive? πŸ€”
Matrix multiplication in Mojo | Modular Docs
Learn how to leverage Mojo's various functions to write a high-performance matmul.
2 Replies
taalhaataahir01022001
taalhaataahir01022001β€’5mo ago
Pavan MV on LinkedIn: Benchmarking Mojo πŸ”₯: The Supercharged Superse...
Benchmarking Mojo πŸ”₯: The Supercharged Superset of Python For the past few months, I'd been hearing about Mojoβ€”a new programming language touted as a superset…
GitHub
Slower Matrix multiplication than numpy Β· Issue #2660 Β· modularml/m...
Bug description I've tried running the Mojo matmul file available in the repository inside examples directory (https://github.com/modularml/mojo/blob/main/examples/matmul.mojo) The output of th...
Darin Simmons
Darin Simmonsβ€’4mo ago
While it is true that numpy is heavily optimized, it's also true that numpy has its own engine. Decades of work have occurred on the engines, BLAS, LAPACK, OpenBLAS, and the new kid IntelMKL. The typical numpy installation uses OpenBLAS. Intel's MKL numpy build is even faster than OpenBLAS. numpy BLAS info The community is attacking this issue in the inaugural #mojo-marathons . While Ethan Darkmatter and others are making strides in pure mojo, the solution has not been found yet. Additionally, other members of the community have been making computing libraries #community-showcase . Mojo is built on MLIR which is different from the IR than C rubs on; and the Modular team has been writing their own MLIR. The MAX engine is also in development. There are many areas for optimization still left to try.
Want results from more Discord servers?
Add your server