M
Modular•16mo ago
vmois

SIMD produces weird results without print statement at the end

The code below produces weird results:
from algorithm import vectorize
from tensor import Tensor

alias type = DType.int64
alias nelts = simdwidthof[type]()

def main():
let size = 10
var a = Tensor[type](size)
var b = Tensor[type](size)
var c = Tensor[type](size)

for i in range(size):
a[i] = i + 1
b[i] = i

@parameter
fn diff[nelts : Int](x : Int):
#print(a.simd_load[nelts](x))
c.simd_store[nelts](x, a.simd_load[nelts](x) - b.simd_load[nelts](x))
#print(nelts, x)
vectorize[nelts, diff](size)

for i in range(size):
print(c[i])

#print(a.simd_load[nelts](0))
from algorithm import vectorize
from tensor import Tensor

alias type = DType.int64
alias nelts = simdwidthof[type]()

def main():
let size = 10
var a = Tensor[type](size)
var b = Tensor[type](size)
var c = Tensor[type](size)

for i in range(size):
a[i] = i + 1
b[i] = i

@parameter
fn diff[nelts : Int](x : Int):
#print(a.simd_load[nelts](x))
c.simd_store[nelts](x, a.simd_load[nelts](x) - b.simd_load[nelts](x))
#print(nelts, x)
vectorize[nelts, diff](size)

for i in range(size):
print(c[i])

#print(a.simd_load[nelts](0))
For example:
2305843009213693952
4611686018780248321
3
1
1
1
1
1
1
1407374883553281
2305843009213693952
4611686018780248321
3
1
1
1
1
1
1
1407374883553281
If I uncomment last print statement, suddenly results are correct:
1
1
1
1
1
1
1
1
1
1
[1, 2]
1
1
1
1
1
1
1
1
1
1
[1, 2]
Can someone please explain this behaviour? Am I doing something wrong with SIMD?
12 Replies
vmois
vmoisOP•16mo ago
I experience the same behaviour on both arm64 and x86
Helehex
Helehex•16mo ago
I get similar results i think uncommenting #print(nelts, x) should give something like 4,4,2
vmois
vmoisOP•16mo ago
To make it work, I uncommeted last print print(a.simd_load[nelts](0)) I have M1, so it gives nelts = 2. Indexes provided by vectorize look fine. 0 2 4 6 8
Helehex
Helehex•16mo ago
ah yeah i get 4, 4, 4, 4 for the one i mentioned and uncommenting the last line doesnt work for me
vmois
vmoisOP•16mo ago
Hmm. Interesting. Try to change size to multiple of 4, like 16?
Helehex
Helehex•16mo ago
using 16: uncommenting the last print doesnt change it for me, other than the 2 undefined elements at the beginning which are still there
vmois
vmoisOP•16mo ago
I see. Surprisingly, my other SIMD implementation for euclidian distance (from Modular blog) is working fine. Maybe I am missing something. When I will have time, I will double check code and maybe look at assembly Thanks for looking
Helehex
Helehex•16mo ago
the inner nelts looks correct for me actually- 4,4,1,1, and uncommenting does solve the issue in a certain sense my bad but still getting the 2 randos at the beginning: happens with simdload, inside of a function decorated with @parameter, using the tensor after the load solves the issue, so it may have to do with the lifetime to solve, you can do: ` = a` for each tensor (at the end of main in this case)
vmois
vmoisOP•16mo ago
What do you mean by simd_load inside the function? Should it still display a correct results after vectorize? But I see where you are pointing. Cause in all examples provided by Modular they returned from function at this point.
ModularBot
ModularBot•16mo ago
Congrats @vmois, you just advanced to level 3!
vmois
vmoisOP•16mo ago
_ = b also works. _ = c doesn't 🙂 Oh wait. Just saw your edir So by using _ = a I force compiler to provide a visibility for my tensor after vectorize? Anyway. That is quite interesting. Thank you for the help.
Helehex
Helehex•16mo ago
no problem

Did you find this page helpful?