M
Modular7mo ago
graphos

Char[X] equivalent in Mojo

Hi I'm new to mojo and I'm trying to learn it. I have a custom file format defined like so: in C++ struct Header { char signature[4]; int contentSize; } How can I express this char[4] to a mojo struct since so far there is no Char Dtype (I could overuse uint8... but it feel very hacky) my end goal is to read a custom binary file format directly into this struct so if you have any example on how to do that, it will be nice. I'm struggling to convert a Tensor to a mojo struct. Thanks in advance !
8 Replies
sora
sora7mo ago
I’m also puzzled by the lack of a Char type. Makes the type for something like ord quite suboptimal.
graphos
graphos7mo ago
Just read https://www.modular.com/blog/mojo-vs-rust-is-mojo-faster-than-rust Mojo's primitives are natively designed to be SIMD-first: UInt8 is actually a SIMD[DType.uint8, 1] which is a SIMD of 1 element. There is no performance overhead to represent it this way, but it allows the programmer to easily use it for SIMD optimizations. For example, you can split up text into 64 byte blocks and represent it as SIMD[DType.uint8, 64] then compare it to a single newline character, in order to find the index for every newline. Because the SIMD registers on your machine can calculate operations on 512bits of data at the same time, this will improve the performance for those operations by 64x! So I think it make sense and its a matter of time before we get an alias. On top of that if you consider all char type, it can become quickly messy
Modular: Mojo vs. Rust: is Mojo 🔥 faster than Rust 🦀 ?
We are building a next-generation AI developer platform for the world. Check out our latest post: Mojo vs. Rust: is Mojo 🔥 faster than Rust 🦀 ?
ModularBot
ModularBot7mo ago
Congrats @graphos, you just advanced to level 1!
sora
sora7mo ago
What do you mena by "concsider all char type"?
graphos
graphos7mo ago
If you consider a char as an alias for byte then its easy, but if you consider a char representing a character then it become a mess. How do you enforce safety with nullbyte e.g a Tensor[char, 4] is supposed to contain 4 char, but then you miss the nullbyte so its size is 5... Then if you think about utf there is utf8,16,32 so whats the size of a char? I always find it odd that c called char what factually represent a single byte. Sementicaly it just feel wrong
sora
sora7mo ago
Of course char can’t be an alias to a byte, not even AsciiChar, not in a modern language. The size of a Char can be a constant (4 for Rust char) or a variable (Swift Character), it’s a design choice to make. I guess the main lesson here is that string with Unicode support is not as simple as Array[Char]. Not sure what you mean by “nullbyte”. Do you mean c style null terminating representation
graphos
graphos7mo ago
Exactly
gryznar
gryznar7mo ago
+1 this will be nice to have char type. But IMHO more general to support non-ascii characters. So maybe like a graphene representation?
Want results from more Discord servers?
Add your server