M
Modular15mo ago
alain

in-place Relu operation on a matrix struct in memory

What would be the optimal portable way to code in mojo the Relu operation on a matrix struct sitting in memory (eg the struct you use in your matmul example), noting that tpu and gpu and cpu may or may not have relu arithmetic blocks on-chip
1 Reply
alain
alainOP15mo ago
I'll answer my own question and see whether anybody can simply confirm it is indeed the way to go. Checking cuda code behind Relu(x) for some popular ML frameworks seems to always lead to fmaxf(x,0) which ofc is simply what Relu is mathematically. So probably there is no magic dialect implementation that is going to be faster than this math op on dedicated hw such as TPU and I simply need to do a vectorized SIMD max(x,0) in mojo too.
Want results from more Discord servers?
Add your server