Matmul.mojo
Hello, Mojocians!
I'm thrilled to share my implementation of matrix multiplication in Mojo. Check out the benchmark results included in the repository to see the performance metrics. It outperforms numpy(OpenBlas) and achieves close performance to the Max engine!
Feel free to explore the repository, run the benchmarks, and integrate it into your projects. Happy coding :).
https://github.com/YichengDWu/matmul.mojo
GitHub
GitHub - YichengDWu/matmul.mojo: High Performance Matrix Multiplica...
High Performance Matrix Multiplication in Pure Mojo 🔥 - YichengDWu/matmul.mojo
24 Replies
I wonder if the fact that numpy uses e cores could be sabotaging its perf. Would be interesting to see a benchmark on a cpu with only 1 kind of core or with e cores disabled
Amazing effort .... it would be really interesting to further benchmark this implantation in other scenarios: single-core performance, scaling with the number of cores, comparison against Intel MKL, on a bunch of architectures... etc
Great work!
How does it compare to MAX on shapes benchmarked in this blog? https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication
Modular: The world's fastest unified matrix multiplication
We are building a next-generation AI developer platform for the world. Check out our latest post: The world's fastest unified matrix multiplication
You're right. Great insight! I will push a commit reflecting that.
The revenge of numpy.
I don't have enough motivation to benchmark certain shapes. I spent about two weeks on matmul.mojo and probably won't have more free time to put into it.
@adakkak You might be interested in this
I like the lines and things go brrrrr as well as kudos for a great accomplishment in mojo alone. For this graph, a BIG FAT legend beneath the lines would be great. At this resolution, even with zooming in it's hard to see.
Also, it says "Max" and it feels like it's short for maximum instead of "MAX" or "MAX engine" which would be clearer. I think you should get a hat AND a cup AND a shirt 😃 . Also feel goods.
I noticed that too. The image shown by plt.show() looks great, but for some reason, the saved image is quite different. I'll try adding markers and changing the format.
@Darin Simmons How about this one?
very nice, legible and colors are clear, last nit would be to turn "Max" into "MAX"
Of course, putting your name, contacts, github, a title on it would typically be suggested but that's really up to you
All great suggestions, thank you!
still impressive how close you are. I think this is the best open source mojo matmul we have? Is running on e cores really default behavior for numpy? It's an insane difference.
NumPy's default behavior is to utilize all available threads, which is quite sensible. When using the MKL backend, leveraging all cores is recommended as MKL optimally utilizes them. On my machine, MKL can achieve over 930 GFLOP/s. Disabling e-cores results in performance close to OpenBLAS.
Updated graph.
Congrats @Ethan, you just advanced to level 9!
OOOOOOhhhhh, aaaaaahhh 😃
Also, don't forget to acknowledge yourself for all the work you've put in. In other words, take a victory lap, pat yourself on your back, etc.
Also, don't forget to acknowledge yourself for all the work you've put in. In other words, take a victory lap, pat yourself on your back, etc.
Thanks for the reminder 😂! Yeah it's definitely a side project that I can take some pride in.
nice work! 🔥
Thank you!
@Ethan Hi, wonderful work! Quick question: Why is hyperthreading disabled ?
Congrats @Dune, you just advanced to level 2!
I was following the setting in this blog:https://www.modular.com/blog/how-to-be-confident-in-your-performance-benchmarking
Modular: How to Be Confident in Your Performance Benchmarking
We are building a next-generation AI developer platform for the world. Check out our latest post: How to Be Confident in Your Performance Benchmarking
@Ethan I tried to implement my own but things didnt work well in mojo (unlike C). Amazing how you did it even without using built-in prefetch and tile functions. Anyway, I renewed your code according to Mojo 24.5 as I'm gonna use it in a project and opened a PR to your repo.
After GPU support, it would be amazing to see a gpu version as well:mojonightly:
Thanks