M
Modular•7mo ago
asosoman

SIMD Troubles ( SIMD[Bool,32] to Int32? and Getting a bit from every byte from SIMD)

Totally newbie with Mojo here. And totally newbie with SIMD too. So, sorry if I'm getting some definitions not really perfect. I decided to try the 1BRC and now I'm on the optimization part of it... I want to check how fast I can go and learn SIMD/Mojo at the same time. Got it working on a non really optimized way. And I've got stuck on some of the SIMD. If I compare a SIMD[uint8,32] to the ASCII new line value, I get back a SIMD[Bool,32], is there a way to cast this type into a int32? Would be great to use ctlz to get the first occurence index. Other option, is using a mask and getting a SIMD[uint8,32] , but then would be great to know if is possible to get a int32 constructed from a bit from every byte on the SIMD. (something equivalent to _mm_movemask_epi8 I guess). Sorry if something doesn't make all that sense... Is quite the learning trip...
3 Replies
sora
sora•7mo ago
I'm sure it's not optimal, but does get the job done:
fn movemask(mask: SIMD[DType.bool, 32]) -> UInt32:
let i = 31 - math.iota[DType.uint32, 32]()
return (mask.cast[DType.uint32]() << i).reduce_add()
fn movemask(mask: SIMD[DType.bool, 32]) -> UInt32:
let i = 31 - math.iota[DType.uint32, 32]()
return (mask.cast[DType.uint32]() << i).reduce_add()
asosoman
asosoman•7mo ago
Hi, thanks for your input. Is slower that what I had, guess that using 32 x int32 doesn't help 😄 In case works for someone, this is my fastest approach until now. Using 32 x uint8. I hope they implement a way to get data from SIMD in a full int32, that would be helpful, and for sure faster that all the comparison is has to happen with the select.
data = c.simd_load[32]()
var TRUE_CASE = math.iota[DType.uint8, 32]()
var FALSE_CASE = SIMD[DType.uint8,32].cast(32)
var mask_nl = (data == NEW_LINE)
var idx_nl = select[DType.uint8,simd_width_u8](mask_nl, TRUE_CASE, FALSE_CASE).reduce_min()
data = c.simd_load[32]()
var TRUE_CASE = math.iota[DType.uint8, 32]()
var FALSE_CASE = SIMD[DType.uint8,32].cast(32)
var mask_nl = (data == NEW_LINE)
var idx_nl = select[DType.uint8,simd_width_u8](mask_nl, TRUE_CASE, FALSE_CASE).reduce_min()
sora
sora•7mo ago
Ah, you wanted argtrue.
Want results from more Discord servers?
Add your server