Mojo for-loop performance

https://github.com/sstadick/rust-vs-mojo-loop While profiling other code, trying to get my perf to match Rust, I noticed that my vanilla for-loop seemed to be one large source of difference. I'm not great with assembly, but looking at what was generated it seemed like Rust was able to skip bound checks when indexing into the arrays since the length of the array was/is given in the range. Has anyone else ran into this? While this is a toy example, I've run into in more complex scenarios and with real data as well. The two programs in question, repo has benching script:
import sys


fn main() raises:
var times = sys.argv()[1].__int__()

var array = List[UInt64]()
for i in range(0, times):
array.append(i)

var sum: UInt64 = 0
for _ in range(0, times):
for i in range(0, times):
sum += array[i]
print(sum)
import sys


fn main() raises:
var times = sys.argv()[1].__int__()

var array = List[UInt64]()
for i in range(0, times):
array.append(i)

var sum: UInt64 = 0
for _ in range(0, times):
for i in range(0, times):
sum += array[i]
print(sum)
use std::env::args;

fn main() {
let times = args()
.skip(1)
.next()
.unwrap()
.parse::<usize>()
.expect("Expected number as first arg");

// I don't think filling the array with the macro has any hidden optimizations, but just in case:
let mut array: Vec<u64> = vec![];
for i in 0..times {
array.push(i as u64)
}

let mut sum = 0;
for _ in 0..times {
for i in 0..times {
sum += array[i];
}
}
println!("{}", sum)
}
use std::env::args;

fn main() {
let times = args()
.skip(1)
.next()
.unwrap()
.parse::<usize>()
.expect("Expected number as first arg");

// I don't think filling the array with the macro has any hidden optimizations, but just in case:
let mut array: Vec<u64> = vec![];
for i in 0..times {
array.push(i as u64)
}

let mut sum = 0;
for _ in 0..times {
for i in 0..times {
sum += array[i];
}
}
println!("{}", sum)
}
GitHub
GitHub - sstadick/rust-vs-mojo-loop
Contribute to sstadick/rust-vs-mojo-loop development by creating an account on GitHub.
2 Replies
duck_tape
duck_tapeOP3w ago
Possibly related, I posted a bug demonstrating the difference in peref when using range(start, end) vs just range(end): https://github.com/modularml/mojo/issues/3931 Is range somehow getting in the way of optimizations?
GitHub
[BUG] Iteration using Range without providing a start is slower tha...
Bug description Iterating over a List or Span is slower when using Range(len(list)) than when using either Range(0, len(list)) or the direct for value in list. Below is a minimal reproducible examp...
duck_tape
duck_tapeOP3w ago
Possibly largely answered by this: https://discord.com/channels/1087530497313357884/1151418895417233429/1326217184963334217 Since it's comparing two binaries running, if Mojo has a slow startup time, that could be it. Startup is slower, but it doesn't explain the full delta between Rust and Mojo in the above programs, especially when the loops are large.

Did you find this page helpful?