asosoman
MModular
•Created by asosoman on 10/16/2024 in #questions
Fastest way to count trailing zeros from a SIMD[Bool,32]?
The summary says it all.
I have a SIMD with 32 bools, and would like to find the first appearance of a True value.
Right now I'm doing the following:
TRUE_CASE = SIMD[DType.uint8,32](0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31)
FALSE_CASE = SIMD[DType.uint8,32](32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
SIMD[Bool,32].select(TRUE_CASE, FALSE_CASE).reduce_min())
I would like to cast the SIMD[Bool,32] into an Int32 and then count trailing zeroes. But have been unable to do the conversion.9 replies
MModular
•Created by asosoman on 2/29/2024 in #questions
Parallelize help (Running time increases a factor of 10 adding a "var += 1").
I'm trying to do 1BRC in Mojo and I'm on the optimizing code part.
I've reduced the code to minimum. If someone could try it would be great to know is not my system.
In a function that is using parallelize, when I try to modify a value from the inside @parameter function that is created outside the function not inside that function slows down the execution by a lot.
Am I missing something obvious?
13 replies
MModular
•Created by asosoman on 2/27/2024 in #questions
SIMD Troubles ( SIMD[Bool,32] to Int32? and Getting a bit from every byte from SIMD)
Totally newbie with Mojo here. And totally newbie with SIMD too. So, sorry if I'm getting some definitions not really perfect.
I decided to try the 1BRC and now I'm on the optimization part of it... I want to check how fast I can go and learn SIMD/Mojo at the same time.
Got it working on a non really optimized way. And I've got stuck on some of the SIMD.
If I compare a SIMD[uint8,32] to the ASCII new line value, I get back a SIMD[Bool,32], is there a way to cast this type into a int32?
Would be great to use ctlz to get the first occurence index.
Other option, is using a mask and getting a SIMD[uint8,32] , but then would be great to know if is possible to get a int32 constructed from a bit from every byte on the SIMD. (something equivalent to _mm_movemask_epi8 I guess).
Sorry if something doesn't make all that sense... Is quite the learning trip...
4 replies