M
Modular12mo ago
benny

SHA256

Any implementations of SHA256 algorithm in Mojo yet, I've seen other work about other hashing algorithms and obviously the builtin, but has anyone implemented this?
28 Replies
brainiacoutcast
brainiacoutcast12mo ago
i may as well go for it as a learning project over the weekend, did it in another language recently so it's still fairly fresh in my mind (someone more familiar probably should too though)
benny
bennyOP12mo ago
if you get a working model started and dm me i would love to optimize it / improve for mojo i just don’t have starting experience with that algorithm, so i’m hesitant to make my own implementation
Maxim
Maxim12mo ago
I did md5 and could try SHA256, but don’t have that much time on my hands, need to finish up other projects . So it’s better if @brainiacoutcast will take a stab at it. @brainiacoutcast let me know if I can help though.
benny
bennyOP12mo ago
didn’t see your repo until now, might also just make a pr using that
getting better
getting better12mo ago
What's wrong with a simple c ffi for it?
Maxim
Maxim12mo ago
External dependency and complexity which comes with it.
brainiacoutcast
brainiacoutcast12mo ago
well lets see what i can pull outta my behind in 2 hours, its a pretty simple algo to write if you already know the language you're using for future reference, on one core of a mid range laptop you should be able to chew through 250-350 megabytes per second with an efficient straightforward cross-platform implementation, but some chips can go way faster with dedicated instructions for some steps here's what i managed, it should be decent performance wise but there's plenty left on the table (particularly if you want to do some feature detection for sha instructions, lol) unfortunately, it's rigorously untested, but i'm through fighting with wsl overall this wasn't too tough to write even on day 1, albeit a little boilerplatey - i'll come back for more mojo when windows sdk is ready, maybe there will be some more convenience methods by then too edit: removed broken code
benny
bennyOP12mo ago
i’ll give it a look when i’m home and try to improve, thanks
Maxim
Maxim12mo ago
I will have a look as well, thanks @brainiacoutcast
brainiacoutcast
brainiacoutcast12mo ago
emphasis on rigorously untested, can't wait to see what you all come up with though
benny
bennyOP12mo ago
maxim will definitly have more to contribute algorithm wise on this one than me, but i’ll lyk once i’ve tried
brainiacoutcast
brainiacoutcast12mo ago
worked out all the bugs as far as i can tell, added a crude benchmark to main - performs pretty well for having to copy everything
brainiacoutcast
brainiacoutcast12mo ago
granular loop unrolling is very nice to have
Michael K
Michael K12mo ago
very nice. I saw your note about copying bytes to DynamicVector. I think you want to do this:
var bytes = DynamicVector[UInt8](byte_view.dynamic_size + 1024)
var bytes_ptr = DTypePointer[DType.uint8, 0](bytes.data.value)
let one_bit: UInt8 = 0b1000_0000
memcpy[DType.uint8](bytes_ptr, byte_view.data, byte_view.dynamic_size)
bytes.size = byte_view.dynamic_size
var bytes = DynamicVector[UInt8](byte_view.dynamic_size + 1024)
var bytes_ptr = DTypePointer[DType.uint8, 0](bytes.data.value)
let one_bit: UInt8 = 0b1000_0000
memcpy[DType.uint8](bytes_ptr, byte_view.data, byte_view.dynamic_size)
bytes.size = byte_view.dynamic_size
This has an address_space inconsistency so you also need to make sure that the argument byte_view specifies the generic address_space:
fn sha256(byte_view: Buffer[_, DType.uint8, 0]) -> InlinedFixedVector[UInt8, 32]:
fn sha256(byte_view: Buffer[_, DType.uint8, 0]) -> InlinedFixedVector[UInt8, 32]:
on my machine this went from 359 megabytes per second to 411. There is also an @unroll missing beforefor dword_i in range(16): that gets a pretty good speed up on my machine. 411 -> 456.
brainiacoutcast
brainiacoutcast12mo ago
that's nuts, what cpu and clock speed are you running with? re: copying ideally i'd like to have no dynamicvector inside the sha function at all, just wanted to read the buffer as-is for most of it and then use an InlinedFixedVector to manage the tail, but it's nice to see that it went a little faster from calling a builtin
brainiacoutcast
brainiacoutcast12mo ago
@Michael K I added in the unroll and changed the function's address space to the generic 0, then i just got in there and went all the way, got rid of the copying in-memory for most of the data, except the last ~64 bytes. On my laptop this takes it from 250 to 350, so you will probably hit 500. Now it's essentially taking the same approach I did in rust, and the performance is within 1%! The next step would be to work in some highly-non-portable assembly (or punt to openssl, lol)
Maxim
Maxim12mo ago
Just had a quick glance what caught my eye: 1. k can be an alias (computed at compile time) I needed a similar lookup table for fast base 64 encoder see https://github.com/mzaks/mojo-fast-base64/blob/main/fast_base64/chromium.mojo#L12 2. h0 .. h7 and consequentially a ..h could be a SIMD[DType.uint32, 8] vector, this way addition would be a single simd operation 3. I think all the big_endian... functions can be replaced by the usage of bswap https://docs.modular.com/mojo/stdlib/math/bit.html#bswap and if you need to turn a SIMD[DType.uint32, 1] into SIMD[DType.uint8, 4] you can use bitcast https://docs.modular.com/mojo/stdlib/memory/unsafe.html#bitcast-2 specifically this API bitcast[new_type: DType, new_width: Int, src_type: DType, src_width: Int](val: SIMD[src_type, src_width]) -> SIMD[new_type, new_width] 4. bitrr can partially be replaced with rotate_bits_right https://docs.modular.com/mojo/stdlib/math/math.html#rotate_bits_right, partially because the shift value needs to be compile time constant, I see some of your arguments are known the others come from vector, I was not sure if it is possible to pre compute this vector at compile time The things I mentioned are rather stylestical things, I did not try if they will actually contribute to performance in a positive or negative way.
GitHub
mojo-fast-base64/fast_base64/chromium.mojo at main · mzaks/mojo-fas...
Contribute to mzaks/mojo-fast-base64 development by creating an account on GitHub.
Maxim
Maxim12mo ago
For reference, I ran your code without any changes on MacMini M1 I got megabytes per second 313.01040703457232 👍
benny
bennyOP12mo ago
can someone give me an example of just running it on a string like sha256(“mojo”)
brainiacoutcast
brainiacoutcast12mo ago
i got no idea how to cast the pointers right for that in mojo
ModularBot
ModularBot12mo ago
Congrats @brainiacoutcast, you just advanced to level 4!
Maxim
Maxim12mo ago
I am currently working on compact-dict after I am done I can take a stab at sha256. @brainiacoutcast do you want to push it to some github repo? Is it ok if I will push it to mojo-hash? I am happy to reference you as initial creator. You can also create a PR in mojo-hash if you wish. https://github.com/mzaks/mojo-hash
GitHub
GitHub - mzaks/mojo-hash: A collection of hash functions implemente...
A collection of hash functions implemented in Mojo - mzaks/mojo-hash
benny
bennyOP12mo ago
what’s the type of the input right now?
brainiacoutcast
brainiacoutcast12mo ago
Buffer[_, DType.uint8, 0]
benny
bennyOP12mo ago
i’m thinking String._as_pointer().bitcastDType.uint8 probably works i’ll do some testing later today
brainiacoutcast
brainiacoutcast12mo ago
you can just use it unattributed
let test_string = "mojo"
let raw_ptr = test_string.data()
let p = raw_ptr.bitcast[DType.uint8]()._as_scalar_pointer()
let buffer = Buffer[Dim(), DType.uint8](p, test_string.__len__())
let hash = sha256(buffer)
let test_string = "mojo"
let raw_ptr = test_string.data()
let p = raw_ptr.bitcast[DType.uint8]()._as_scalar_pointer()
let buffer = Buffer[Dim(), DType.uint8](p, test_string.__len__())
let hash = sha256(buffer)
Maxim
Maxim11mo ago
I played around with the SHA256 implementation, it seems like it is not valid for longer streams.
brainiacoutcast
brainiacoutcast11mo ago
was that the later two or the very first one at the top of the channel? first ones broke

Did you find this page helpful?