M
Modular10mo ago
swig.

Segmentation Error Occurring With Set Implementation

Hi there, I'm having a very difficult time implementing a custom datatype into the builtin set object. For somebackground I am making a an AI model from scratch to play snake and am trying to create a training environment for n number of snakes in a generation. Each update of a generation moves the snake one pixel. I have designed the snake's move operation to add a xy coordinate to a set and remove the oldest xy coordinate from the set (this is determined by a deque implementation which is unrelated). Each portion of the snakes body is stored in a Position object which contains the x and y position and a simd vector containing the underlying data. The data structure is Hashable but when adding a position object, sometimes this works correctly but in some very small instances I get a segmentation error. Here is the code for the Position data structure I made. Let me know if there is a way to hash a tuple object or any other simpler way to hash xy positions?
alias dtype = DType.int16

@value
struct Position(KeyElement):
var x: SIMD[dtype, 1]
var y: SIMD[dtype, 1]
var data: SIMD[dtype, 2]
var hash: Int

fn __init__(inout self, x: SIMD[dtype, 1], y: SIMD[dtype, 1]):
self.x = x
self.y = y
self.data = SIMD[dtype, 2](self.x, self.y)
self.hash = hash(self.data)

fn __hash__(self) -> Int:
return self.hash

fn __eq__(self, other: Self) -> Bool:
return self.hash == other.hash

fn __ne__(self, other: Self) -> Bool:
return self.hash != other.hash

fn __str__(self) -> String:
return str(self.data)

fn __len__(self) -> Int:
return len(self.data)
alias dtype = DType.int16

@value
struct Position(KeyElement):
var x: SIMD[dtype, 1]
var y: SIMD[dtype, 1]
var data: SIMD[dtype, 2]
var hash: Int

fn __init__(inout self, x: SIMD[dtype, 1], y: SIMD[dtype, 1]):
self.x = x
self.y = y
self.data = SIMD[dtype, 2](self.x, self.y)
self.hash = hash(self.data)

fn __hash__(self) -> Int:
return self.hash

fn __eq__(self, other: Self) -> Bool:
return self.hash == other.hash

fn __ne__(self, other: Self) -> Bool:
return self.hash != other.hash

fn __str__(self) -> String:
return str(self.data)

fn __len__(self) -> Int:
return len(self.data)
15 Replies
ModularBot
ModularBot10mo ago
Congrats @swig., you just advanced to level 2!
Maxim
Maxim10mo ago
Hi, I can’t tell why you get segmentation error, but the implementation of the structure you provided has multiple flaws. Field data is redundant to fields x and y, which is a bad idea as struct fields are mutable, so user can update the one field without updating the other. Hash is computed on initialization and stored as a field, which again is a bad idea because of possible mutations. Equality comparison is based on hash value and not on the actual data/x,y fields. You should not do this as you might get a hash collision for two non equal positions, it is not probable, but possible, dependent on the hash function implementation. I am not sure what expect from the length of the Position.
Ryulord
Ryulord10mo ago
I was actually running into a segfault the other day while using Set and fixed it it by switching to just using a List so I suspect there's some bug in Set or hash atm but couldn't be bothered to look into it at the time. Could you show the rest of your code? Looks like it might be easier to find the root cause than it would be in mine.
hat
hat10mo ago
I've had seg faults with Dict too (I think Set[T] is just a wrapper around Dict[T, Bool] but correct me if I'm wrong). I'm struggling to produce a minimal example, which is why I haven't posted, but if you check the code I wrote here https://github.com/jmkopper/Advent-of-Code-2022/blob/main/day12/day12.mojo#L11 (reproduced below), I ran into segfaults without taking the quotient modulo a small number
@value
struct Point(KeyElement):
var r: Int
var c: Int

fn __hash__(self) -> Int:
return (hash(self.r) ^ hash(self.c)) % 10_000
@value
struct Point(KeyElement):
var r: Int
var c: Int

fn __hash__(self) -> Int:
return (hash(self.r) ^ hash(self.c)) % 10_000
It doesn't happen for all inputs, just the advent of code input I was working with. It could, of course, be a problem elsewhere, but I haven't seen evidence of that
Ryulord
Ryulord10mo ago
Just checked the std lib and it's Dict[T, NoneType] so yeah, probably a bug in Dict
Maxim
Maxim10mo ago
GitHub
[BUG]: Dict crashes · Issue #1729 · modularml/mojo
Bug description While executing the code listed in reproduce section I experience a crash with following output: Stack dump: 0. Program arguments: /home/maxim/.modular/pkg/packages.modular.com_mojo...
Maxim
Maxim10mo ago
I didn’t reopen the issue yet, if you folks have time please check and open a new issues with a reference to this one. See the last comment from Joe. I just checked it now Hm, I think I might try to fix it myself actually. But anyways, folks please try if it is the same bug, for me it was fixed on M1 Mac but crashed on Intel.
hat
hat10mo ago
Oh yeah, that looks like the same bug. Crashes for me on Intel i7
def main():
var keys = String('Проснувшись однажды утром после беспокойного сна, Грегор Замза').split(" ")
var d = Dict[String, Int]()
for i in range(len(keys)):
d[keys[i]] = i
def main():
var keys = String('Проснувшись однажды утром после беспокойного сна, Грегор Замза').split(" ")
var d = Dict[String, Int]()
for i in range(len(keys)):
d[keys[i]] = i
Maxim
Maxim10mo ago
I will try to find time to fix it next week.
Ryulord
Ryulord10mo ago
Both my bug and Blake's don't seem to be string related
swig.
swig.OP10mo ago
Hey Maxim, thanks for taking the time to respond. I've been busy with schoolwork so I didn't realize the post got traction. As for the mutability concerns after hashing, the Position struct implementation in my snake trainer doesn't change that x and y positions after calling the constructor so I don't have to worry about collisions. I'm 99.9% sure it has something to do with the SIMD hash function or the Dict datatype. The reason I made the hash an instance variable is because I wanted to ensure constant time complexity with the hash function so the Set[Position] can be efficient. I don't know exactly how SIMD's hashing works so I wasn't sure if it was constant time. The len is not used anymore I'll probably remove it eventually. I just decided to switch to a List to avoid this issue but it would be useful to use the Set in the future.
Maxim
Maxim10mo ago
I found the bug. It’s Dict related and will occur no matter what type the key is. It has to do with collision resolution in Dict, in some cases it can not resolve the collision and it crashes. I will need to analyze the algorithm more to understand if I can fix it. Yeah, it’s the probing function. It’s not just simple linear probing but a fancy random probing. There are multiple ways to fix it, I will have to discuss it with Modular team first. What they prefer. Will ping them in my GitHub issue.
hat
hat10mo ago
What an interesting bug! Thank you, Maxim, for looking into it
Maxim
Maxim10mo ago
The issue was reopened btw.
benny
benny10mo ago
Completely unrelated but I would also recommend using Scalar[DType] instead of SIMD[DType, 1] as it is more readable

Did you find this page helpful?