mojo-sort

GitHub
GitHub - mzaks/mojo-sort
Contribute to mzaks/mojo-sort development by creating an account on GitHub.
37 Replies
Maxim
Maxim4mo ago
Added parallel tim sort. On large lists it seems to be up to 3x faster than Tim sort (tried on m1 and Intel i7 with 8 cores), but still slower than Radix sort. I guess on a machine with 64 Cores that might be a different story though.
Mohamed Mabrouk
Mohamed Mabrouk4mo ago
If it can be useful and there is a benchmark script. I can run it on one of our quiet servers (96 cores Threadripper - Ubuntu server 22.04)
Maxim
Maxim4mo ago
That would be awesome, the benchmark script can be found here: https://github.com/mzaks/mojo-sort/blob/main/benchmark.mojo if you give me a few minutes I can also make it output proper CSV.
GitHub
mojo-sort/benchmark.mojo at main · mzaks/mojo-sort
Contribute to mzaks/mojo-sort development by creating an account on GitHub.
Mohamed Mabrouk
Mohamed Mabrouk4mo ago
No worries, just ping me when ever you are happy with the script.
Maxim
Maxim4mo ago
it's ready 🙂
Maxim
Maxim4mo ago
GitHub
mojo-sort/benchmark_result_m1.csv at main · mzaks/mojo-sort
Contribute to mzaks/mojo-sort development by creating an account on GitHub.
Maxim
Maxim4mo ago
GitHub
mojo-sort/benchmark_result_i7.csv at main · mzaks/mojo-sort
Contribute to mzaks/mojo-sort development by creating an account on GitHub.
Maxim
Maxim4mo ago
The 0 values in m1 benchmark for smaller lists are due to the fact that time function on mac does not give nanosecond precision it rounds to microseconds
Mohamed Mabrouk
Mohamed Mabrouk4mo ago
Should I run it on Mojo nightly or the main branch?
Maxim
Maxim4mo ago
Nightly please. Speaking of, I think it is possible to implement a parallel Radix sort, which is kind of cool 🙂
Mohamed Mabrouk
Mohamed Mabrouk4mo ago
Here is the results. I ran the benchmark using
mojo build benchmark.mojo
./benchmark
mojo build benchmark.mojo
./benchmark
The server that I ran it on has dual AMD EPYC 7401 24 cores, 2 threads/core. All the cores were working by they didn't reach 100%
Maxim
Maxim4mo ago
Hm, interesting generally the single core algorithms seems to be much slower, then m1 (I guess the machine has lots of CPU but they are not as powerful), you know what, I think I am doing the parallelisation wrong 🙂 need to rewrite the algorithm ok, will continue on it tomorrow 🙂
aurelian
aurelian4mo ago
afaik LLVM struggles to interleave vector reads on ARM might be a factor
Maxim
Maxim4mo ago
Updated the lib after the ComparableCollectionElement landed in nightly. Benchmarks for string sorting are quite interesting.
Sitron
Sitron4mo ago
Benchmarks for string sorting are quite interesting.
Interesting bad or interesting good? Or just weird?
Want results from more Discord servers?
Add your server