M
Modular•4mo ago
Ethan

Matmul.mojo

Hello, Mojocians! I'm thrilled to share my implementation of matrix multiplication in Mojo. Check out the benchmark results included in the repository to see the performance metrics. It outperforms numpy(OpenBlas) and achieves close performance to the Max engine! Feel free to explore the repository, run the benchmarks, and integrate it into your projects. Happy coding :). https://github.com/YichengDWu/matmul.mojo
GitHub
GitHub - YichengDWu/matmul.mojo: High Performance Matrix Multiplica...
High Performance Matrix Multiplication in Pure Mojo 🔥 - YichengDWu/matmul.mojo
24 Replies
Ryulord
Ryulord•4mo ago
I wonder if the fact that numpy uses e cores could be sabotaging its perf. Would be interesting to see a benchmark on a cpu with only 1 kind of core or with e cores disabled
Mohamed Mabrouk
Mohamed Mabrouk•4mo ago
Amazing effort .... it would be really interesting to further benchmark this implantation in other scenarios: single-core performance, scaling with the number of cores, comparison against Intel MKL, on a bunch of architectures... etc
sora
sora•4mo ago
Great work! How does it compare to MAX on shapes benchmarked in this blog? https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication
Modular: The world's fastest unified matrix multiplication
We are building a next-generation AI developer platform for the world. Check out our latest post: The world's fastest unified matrix multiplication
Ethan
EthanOP•4mo ago
You're right. Great insight! I will push a commit reflecting that.
Ethan
EthanOP•4mo ago
No description
Ethan
EthanOP•4mo ago
The revenge of numpy. I don't have enough motivation to benchmark certain shapes. I spent about two weeks on matmul.mojo and probably won't have more free time to put into it.
Darin Simmons
Darin Simmons•4mo ago
@adakkak You might be interested in this I like the lines and things go brrrrr as well as kudos for a great accomplishment in mojo alone. For this graph, a BIG FAT legend beneath the lines would be great. At this resolution, even with zooming in it's hard to see. Also, it says "Max" and it feels like it's short for maximum instead of "MAX" or "MAX engine" which would be clearer. I think you should get a hat AND a cup AND a shirt 😃 . Also feel goods.
Ethan
EthanOP•4mo ago
I noticed that too. The image shown by plt.show() looks great, but for some reason, the saved image is quite different. I'll try adding markers and changing the format.
Ethan
EthanOP•4mo ago
No description
Ethan
EthanOP•4mo ago
@Darin Simmons How about this one?
Darin Simmons
Darin Simmons•4mo ago
very nice, legible and colors are clear, last nit would be to turn "Max" into "MAX" Of course, putting your name, contacts, github, a title on it would typically be suggested but that's really up to you
Ethan
EthanOP•4mo ago
All great suggestions, thank you!
Ryulord
Ryulord•4mo ago
still impressive how close you are. I think this is the best open source mojo matmul we have? Is running on e cores really default behavior for numpy? It's an insane difference.
Ethan
EthanOP•4mo ago
NumPy's default behavior is to utilize all available threads, which is quite sensible. When using the MKL backend, leveraging all cores is recommended as MKL optimally utilizes them. On my machine, MKL can achieve over 930 GFLOP/s. Disabling e-cores results in performance close to OpenBLAS.
Ethan
EthanOP•4mo ago
Updated graph.
No description
ModularBot
ModularBot•4mo ago
Congrats @Ethan, you just advanced to level 9!
Darin Simmons
Darin Simmons•4mo ago
OOOOOOhhhhh, aaaaaahhh 😃
Also, don't forget to acknowledge yourself for all the work you've put in. In other words, take a victory lap, pat yourself on your back, etc.
Ethan
EthanOP•4mo ago
Thanks for the reminder 😂! Yeah it's definitely a side project that I can take some pride in.
TilliFe
TilliFe•4mo ago
nice work! 🔥
Ethan
EthanOP•4mo ago
Thank you!
Dune
Dune•4mo ago
@Ethan Hi, wonderful work! Quick question: Why is hyperthreading disabled ?
ModularBot
ModularBot•4mo ago
Congrats @Dune, you just advanced to level 2!
Ethan
EthanOP•4mo ago
Modular: How to Be Confident in Your Performance Benchmarking
We are building a next-generation AI developer platform for the world. Check out our latest post: How to Be Confident in Your Performance Benchmarking
DobyDabaDu
DobyDabaDu•2mo ago
@Ethan I tried to implement my own but things didnt work well in mojo (unlike C). Amazing how you did it even without using built-in prefetch and tile functions. Anyway, I renewed your code according to Mojo 24.5 as I'm gonna use it in a project and opened a PR to your repo. After GPU support, it would be amazing to see a gpu version as well:mojonightly: Thanks
Want results from more Discord servers?
Add your server