M
Modular•2mo ago
Ethan

Matmul.mojo

Hello, Mojocians! I'm thrilled to share my implementation of matrix multiplication in Mojo. Check out the benchmark results included in the repository to see the performance metrics. It outperforms numpy(OpenBlas) and achieves close performance to the Max engine! Feel free to explore the repository, run the benchmarks, and integrate it into your projects. Happy coding :). https://github.com/YichengDWu/matmul.mojo
GitHub
GitHub - YichengDWu/matmul.mojo: High Performance Matrix Multiplica...
High Performance Matrix Multiplication in Pure Mojo 🔥 - YichengDWu/matmul.mojo
23 Replies
Ryulord
Ryulord•2mo ago
I wonder if the fact that numpy uses e cores could be sabotaging its perf. Would be interesting to see a benchmark on a cpu with only 1 kind of core or with e cores disabled
Mohamed Mabrouk
Mohamed Mabrouk•2mo ago
Amazing effort .... it would be really interesting to further benchmark this implantation in other scenarios: single-core performance, scaling with the number of cores, comparison against Intel MKL, on a bunch of architectures... etc
sora
sora•2mo ago
Great work! How does it compare to MAX on shapes benchmarked in this blog? https://www.modular.com/blog/the-worlds-fastest-unified-matrix-multiplication
Modular: The world's fastest unified matrix multiplication
We are building a next-generation AI developer platform for the world. Check out our latest post: The world's fastest unified matrix multiplication
Ethan
Ethan•2mo ago
You're right. Great insight! I will push a commit reflecting that.
Ethan
Ethan•2mo ago
No description
Ethan
Ethan•2mo ago
The revenge of numpy. I don't have enough motivation to benchmark certain shapes. I spent about two weeks on matmul.mojo and probably won't have more free time to put into it.
Darin Simmons
Darin Simmons•2mo ago
@adakkak You might be interested in this I like the lines and things go brrrrr as well as kudos for a great accomplishment in mojo alone. For this graph, a BIG FAT legend beneath the lines would be great. At this resolution, even with zooming in it's hard to see. Also, it says "Max" and it feels like it's short for maximum instead of "MAX" or "MAX engine" which would be clearer. I think you should get a hat AND a cup AND a shirt 😃 . Also feel goods.
Ethan
Ethan•2mo ago
I noticed that too. The image shown by plt.show() looks great, but for some reason, the saved image is quite different. I'll try adding markers and changing the format.
Ethan
Ethan•2mo ago
No description
Ethan
Ethan•2mo ago
@Darin Simmons How about this one?
Darin Simmons
Darin Simmons•2mo ago
very nice, legible and colors are clear, last nit would be to turn "Max" into "MAX" Of course, putting your name, contacts, github, a title on it would typically be suggested but that's really up to you
Ethan
Ethan•2mo ago
All great suggestions, thank you!
Ryulord
Ryulord•2mo ago
still impressive how close you are. I think this is the best open source mojo matmul we have? Is running on e cores really default behavior for numpy? It's an insane difference.
Ethan
Ethan•2mo ago
NumPy's default behavior is to utilize all available threads, which is quite sensible. When using the MKL backend, leveraging all cores is recommended as MKL optimally utilizes them. On my machine, MKL can achieve over 930 GFLOP/s. Disabling e-cores results in performance close to OpenBLAS.
Ethan
Ethan•2mo ago
Updated graph.
No description
Want results from more Discord servers?
Add your server