SHA256
Any implementations of SHA256 algorithm in Mojo yet, I've seen other work about other hashing algorithms and obviously the builtin, but has anyone implemented this?
28 Replies
i may as well go for it as a learning project over the weekend, did it in another language recently so it's still fairly fresh in my mind (someone more familiar probably should too though)
if you get a working model started and dm me i would love to optimize it / improve for mojo
i just don’t have starting experience with that algorithm, so i’m hesitant to make my own implementation
I did md5 and could try SHA256, but don’t have that much time on my hands, need to finish up other projects . So it’s better if @brainiacoutcast will take a stab at it. @brainiacoutcast let me know if I can help though.
didn’t see your repo until now, might also just make a pr using that
What's wrong with a simple c ffi for it?
External dependency and complexity which comes with it.
well lets see what i can pull outta my behind in 2 hours, its a pretty simple algo to write if you already know the language you're using
for future reference, on one core of a mid range laptop you should be able to chew through 250-350 megabytes per second with an efficient straightforward cross-platform implementation, but some chips can go way faster with dedicated instructions for some steps
here's what i managed, it should be decent performance wise but there's plenty left on the table (particularly if you want to do some feature detection for sha instructions, lol)
unfortunately, it's rigorously untested, but i'm through fighting with wsl
overall this wasn't too tough to write even on day 1, albeit a little boilerplatey - i'll come back for more mojo when windows sdk is ready, maybe there will be some more convenience methods by then too
edit: removed broken code
i’ll give it a look when i’m home and try to improve, thanks
I will have a look as well, thanks @brainiacoutcast
emphasis on rigorously untested, can't wait to see what you all come up with though
maxim will definitly have more to contribute algorithm wise on this one than me, but i’ll lyk once i’ve tried
worked out all the bugs as far as i can tell, added a crude benchmark to main - performs pretty well for having to copy everything
granular loop unrolling is very nice to have
very nice. I saw your note about copying bytes to DynamicVector. I think you want to do this:
This has an
address_space
inconsistency so you also need to make sure that the argument byte_view
specifies the generic address_space
:
on my machine this went from 359 megabytes per second to 411.
There is also an @unroll
missing beforefor dword_i in range(16):
that gets a pretty good speed up on my machine. 411 -> 456.that's nuts, what cpu and clock speed are you running with?
re: copying ideally i'd like to have no dynamicvector inside the sha function at all, just wanted to read the buffer as-is for most of it and then use an InlinedFixedVector to manage the tail, but it's nice to see that it went a little faster from calling a builtin
@Michael K I added in the unroll and changed the function's address space to the generic 0, then i just got in there and went all the way, got rid of the copying in-memory for most of the data, except the last ~64 bytes. On my laptop this takes it from 250 to 350, so you will probably hit 500. Now it's essentially taking the same approach I did in rust, and the performance is within 1%! The next step would be to work in some highly-non-portable assembly (or punt to openssl, lol)
Just had a quick glance what caught my eye:
1.
k
can be an alias (computed at compile time) I needed a similar lookup table for fast base 64 encoder see https://github.com/mzaks/mojo-fast-base64/blob/main/fast_base64/chromium.mojo#L12
2. h0 .. h7
and consequentially a ..h
could be a SIMD[DType.uint32, 8]
vector, this way addition would be a single simd operation
3. I think all the big_endian...
functions can be replaced by the usage of bswap
https://docs.modular.com/mojo/stdlib/math/bit.html#bswap and if you need to turn a SIMD[DType.uint32, 1] into SIMD[DType.uint8, 4] you can use bitcast
https://docs.modular.com/mojo/stdlib/memory/unsafe.html#bitcast-2 specifically this API bitcast[new_type: DType, new_width: Int, src_type: DType, src_width: Int](val: SIMD[src_type, src_width]) -> SIMD[new_type, new_width]
4. bitrr
can partially be replaced with rotate_bits_right
https://docs.modular.com/mojo/stdlib/math/math.html#rotate_bits_right, partially because the shift value needs to be compile time constant, I see some of your arguments are known the others come from vector, I was not sure if it is possible to pre compute this vector at compile time
The things I mentioned are rather stylestical things, I did not try if they will actually contribute to performance in a positive or negative way.Modular Docs - math
Module
Modular Docs - unsafe
Module
Modular Docs - bit
Module
GitHub
mojo-fast-base64/fast_base64/chromium.mojo at main · mzaks/mojo-fas...
Contribute to mzaks/mojo-fast-base64 development by creating an account on GitHub.
For reference, I ran your code without any changes on MacMini M1 I got
megabytes per second 313.01040703457232
👍can someone give me an example of just running it on a string
like sha256(“mojo”)
i got no idea how to cast the pointers right for that in mojo
Congrats @brainiacoutcast, you just advanced to level 4!
I am currently working on compact-dict after I am done I can take a stab at sha256. @brainiacoutcast do you want to push it to some github repo? Is it ok if I will push it to mojo-hash? I am happy to reference you as initial creator. You can also create a PR in mojo-hash if you wish. https://github.com/mzaks/mojo-hash
GitHub
GitHub - mzaks/mojo-hash: A collection of hash functions implemente...
A collection of hash functions implemented in Mojo - mzaks/mojo-hash
what’s the type of the input right now?
Buffer[_, DType.uint8, 0]
i’m thinking String._as_pointer().bitcastDType.uint8 probably works
i’ll do some testing later today
you can just use it unattributed
I played around with the SHA256 implementation, it seems like it is not valid for longer streams.
was that the later two or the very first one at the top of the channel? first ones broke