Mojo added to SpeedTests repo on github
I recently came across jabbalaci's SpeedTests https://github.com/jabbalaci/SpeedTests, and I made a PR to add Mojo, which got accepted today! The project's idea is to stick to the basic implementation pattern of the task without using language-specific tricks, so I just followed the Python implementation.
It's a insightful list of benchmark results, and Mojo is holding up pretty well. 😉
12 Replies
This is super cool! Mojo does seem to hold up well 🙂
can you update mojo to the latest version to improve the speed of print? i think it will have a huge difference. https://github.com/modularml/mojo/commit/9cbfa411d1ea0f01d0b7c1732cc4e36f08c1f969
GitHub
[stdlib] Buffer output before printing and writing to file · modula...
This greatly improves the print performance, minimizing syscalls and
multiple variadic pack loads. These are results from printing 5 args in
a 10k loop on the fast alacritty terminal emulator:
```...
There are only five numbers printed by the program, so this change likely won't have a noticeable impact on performance. When I run the program locally, it seems that the current nightly version is actually performing a bit slower for some reason. Let's wait for the next stable version to see if there’s a noticeable performance improvement. If there is, I'll request the GitHub repository owner to rerun the benchmark with the latest version.
Some insightful comments by @Owen Hilyard and @Martin Vuyk on various ways how this simple task can be implemented appeared on github https://github.com/jabbalaci/SpeedTests/pull/63 :mojo:
GitHub
Use more ideomatic Mojo and unify integer width with C. by owenhil...
Mojo prefers to make use of the SIMD type for small collections. In this case, it allocates a i32x16 block because it has to be a power of 2, but only the first 10 slots are used as with the origin...
I’ll make a comment later but I closed it because I realized it would be a massive performance regression on the system the repo owner tests on due to it having half the SIMD width I do.
for some of the languages, the repo has parallelized and vectorized version of the code. https://github.com/jabbalaci/SpeedTests
GitHub
GitHub - jabbalaci/SpeedTests: comparing the execution speeds of va...
comparing the execution speeds of various programming languages - jabbalaci/SpeedTests
is this right? A simple change to
UInt32
took a full second off my time:
Before:
After:
The main issue is that benchmarks are run on a haswell system, which is more than a decade old. It lacks a lot of modern SIMD, which Mojo uses heavily, so it takes poorly performing fallback paths.
v2 of the code is faster. https://github.com/jabbalaci/SpeedTests?tab=readme-ov-file#mojo
Interesting comments, benchmark results, and various PRs aimed at improving the initial Mojo implementation have appeared on GitHub: :mojo:
https://github.com/jabbalaci/SpeedTests/pulls?q=mojo
Currently, we have a standard, Python-like version (v1) and an additional version (v2) aimed at further improving performance.
GitHub
Pull requests · jabbalaci/SpeedTests
comparing the execution speeds of various programming languages - Pull requests · jabbalaci/SpeedTests
I should have put inlinearray and unsafe_get in v2
zig has no bounds checks in ReleaseFast
though it seems to be a tiny improvement on my machine
it seems the github repo owner is very open for further PRs even so maybe good not to bombard him too frequently with PRs 😉