asosoman
asosoman
MModular
Created by asosoman on 2/29/2024 in #questions
Parallelize help (Running time increases a factor of 10 adding a "var += 1").
I'm trying to do 1BRC in Mojo and I'm on the optimizing code part. I've reduced the code to minimum. If someone could try it would be great to know is not my system. In a function that is using parallelize, when I try to modify a value from the inside @parameter function that is created outside the function not inside that function slows down the execution by a lot. Am I missing something obvious?
# TEST FOR SPEED
from algorithm import parallelize
from time import now
from sys.info import num_physical_cores

fn read_file(bounds: DynamicVector[Int]):
var total: StaticTuple[8,Int64] = StaticTuple[8,Int64](0,0,0,0,0,0,0,0)
@parameter
fn read_chunk(bound_idx: Int):
var offset: Int = bounds[bound_idx]
var r: Int = bounds[bound_idx+1]
while True:

# IF I UNCOMMENT THE NEXT LINE, GOES FROM 0,001 s execution time to more than 2s
# total[bound_idx] = total[bound_idx] + 1

offset += 1
if offset+1 >= r:
break
parallelize[read_chunk](len(bounds)-1)
return

fn main() raises:
var bounds = DynamicVector[Int]()
for i in range(0, num_physical_cores()+1):
var offset = int(100_000_000 / num_physical_cores() * i)
bounds.append(offset)

var start: Int = now()
read_file(bounds)
print("Total time:", (now()-start) / 1_000_000_000)
# TEST FOR SPEED
from algorithm import parallelize
from time import now
from sys.info import num_physical_cores

fn read_file(bounds: DynamicVector[Int]):
var total: StaticTuple[8,Int64] = StaticTuple[8,Int64](0,0,0,0,0,0,0,0)
@parameter
fn read_chunk(bound_idx: Int):
var offset: Int = bounds[bound_idx]
var r: Int = bounds[bound_idx+1]
while True:

# IF I UNCOMMENT THE NEXT LINE, GOES FROM 0,001 s execution time to more than 2s
# total[bound_idx] = total[bound_idx] + 1

offset += 1
if offset+1 >= r:
break
parallelize[read_chunk](len(bounds)-1)
return

fn main() raises:
var bounds = DynamicVector[Int]()
for i in range(0, num_physical_cores()+1):
var offset = int(100_000_000 / num_physical_cores() * i)
bounds.append(offset)

var start: Int = now()
read_file(bounds)
print("Total time:", (now()-start) / 1_000_000_000)
13 replies
MModular
Created by asosoman on 2/27/2024 in #questions
SIMD Troubles ( SIMD[Bool,32] to Int32? and Getting a bit from every byte from SIMD)
Totally newbie with Mojo here. And totally newbie with SIMD too. So, sorry if I'm getting some definitions not really perfect. I decided to try the 1BRC and now I'm on the optimization part of it... I want to check how fast I can go and learn SIMD/Mojo at the same time. Got it working on a non really optimized way. And I've got stuck on some of the SIMD. If I compare a SIMD[uint8,32] to the ASCII new line value, I get back a SIMD[Bool,32], is there a way to cast this type into a int32? Would be great to use ctlz to get the first occurence index. Other option, is using a mask and getting a SIMD[uint8,32] , but then would be great to know if is possible to get a int32 constructed from a bit from every byte on the SIMD. (something equivalent to _mm_movemask_epi8 I guess). Sorry if something doesn't make all that sense... Is quite the learning trip...
4 replies