Modular•15mo ago

SHA256

Any implementations of SHA256 algorithm in Mojo yet, I've seen other work about other hashing algorithms and obviously the builtin, but has anyone implemented this?

28 Replies

brainiacoutcast•15mo ago

i may as well go for it as a learning project over the weekend, did it in another language recently so it's still fairly fresh in my mind (someone more familiar probably should too though)

bennyOP•15mo ago

if you get a working model started and dm me i would love to optimize it / improve for mojo i just don’t have starting experience with that algorithm, so i’m hesitant to make my own implementation

Maxim•15mo ago

I did md5 and could try SHA256, but don’t have that much time on my hands, need to finish up other projects . So it’s better if @brainiacoutcast will take a stab at it. @brainiacoutcast let me know if I can help though.

bennyOP•15mo ago

didn’t see your repo until now, might also just make a pr using that

getting better•15mo ago

What's wrong with a simple c ffi for it?

Maxim•15mo ago

External dependency and complexity which comes with it.

brainiacoutcast•15mo ago

well lets see what i can pull outta my behind in 2 hours, its a pretty simple algo to write if you already know the language you're using for future reference, on one core of a mid range laptop you should be able to chew through 250-350 megabytes per second with an efficient straightforward cross-platform implementation, but some chips can go way faster with dedicated instructions for some steps here's what i managed, it should be decent performance wise but there's plenty left on the table (particularly if you want to do some feature detection for sha instructions, lol) unfortunately, it's rigorously untested, but i'm through fighting with wsl overall this wasn't too tough to write even on day 1, albeit a little boilerplatey - i'll come back for more mojo when windows sdk is ready, maybe there will be some more convenience methods by then too edit: removed broken code

bennyOP•15mo ago

i’ll give it a look when i’m home and try to improve, thanks

Maxim•15mo ago

I will have a look as well, thanks @brainiacoutcast

brainiacoutcast•15mo ago

emphasis on rigorously untested, can't wait to see what you all come up with though

bennyOP•15mo ago

maxim will definitly have more to contribute algorithm wise on this one than me, but i’ll lyk once i’ve tried

brainiacoutcast•15mo ago

worked out all the bugs as far as i can tell, added a crude benchmark to main - performs pretty well for having to copy everything

sha256_working.mojo

brainiacoutcast•15mo ago

granular loop unrolling is very nice to have

Michael K•15mo ago

very nice. I saw your note about copying bytes to DynamicVector. I think you want to do this:

var bytes = DynamicVector[UInt8](byte_view.dynamic_size + 1024)
var bytes_ptr = DTypePointer[DType.uint8, 0](bytes.data.value)
let one_bit: UInt8 = 0b1000_0000
memcpy[DType.uint8](bytes_ptr, byte_view.data, byte_view.dynamic_size)
bytes.size = byte_view.dynamic_size

var bytes = DynamicVector[UInt8](byte_view.dynamic_size + 1024)
var bytes_ptr = DTypePointer[DType.uint8, 0](bytes.data.value)
let one_bit: UInt8 = 0b1000_0000
memcpy[DType.uint8](bytes_ptr, byte_view.data, byte_view.dynamic_size)
bytes.size = byte_view.dynamic_size

This has an address_space inconsistency so you also need to make sure that the argument byte_view specifies the generic address_space:

fn sha256(byte_view: Buffer[_, DType.uint8, 0]) -> InlinedFixedVector[UInt8, 32]:

fn sha256(byte_view: Buffer[_, DType.uint8, 0]) -> InlinedFixedVector[UInt8, 32]:

on my machine this went from 359 megabytes per second to 411. There is also an @unroll missing beforefor dword_i in range(16): that gets a pretty good speed up on my machine. 411 -> 456.

brainiacoutcast•15mo ago

that's nuts, what cpu and clock speed are you running with? re: copying ideally i'd like to have no dynamicvector inside the sha function at all, just wanted to read the buffer as-is for most of it and then use an InlinedFixedVector to manage the tail, but it's nice to see that it went a little faster from calling a builtin

brainiacoutcast•15mo ago

@Michael K I added in the unroll and changed the function's address space to the generic 0, then i just got in there and went all the way, got rid of the copying in-memory for most of the data, except the last ~64 bytes. On my laptop this takes it from 250 to 350, so you will probably hit 500. Now it's essentially taking the same approach I did in rust, and the performance is within 1%! The next step would be to work in some highly-non-portable assembly (or punt to openssl, lol)

sha256_mostly_nocopy...

Maxim•15mo ago

Just had a quick glance what caught my eye: 1. k can be an alias (computed at compile time) I needed a similar lookup table for fast base 64 encoder see https://github.com/mzaks/mojo-fast-base64/blob/main/fast_base64/chromium.mojo#L12 2. h0 .. h7 and consequentially a ..h could be a SIMD[DType.uint32, 8] vector, this way addition would be a single simd operation 3. I think all the big_endian... functions can be replaced by the usage of bswap https://docs.modular.com/mojo/stdlib/math/bit.html#bswap and if you need to turn a SIMD[DType.uint32, 1] into SIMD[DType.uint8, 4] you can use bitcast https://docs.modular.com/mojo/stdlib/memory/unsafe.html#bitcast-2 specifically this API

bitcast[new_type: DType, new_width: Int, src_type: DType, src_width: Int](val: SIMD[src_type, src_width]) -> SIMD[new_type, new_width]

4. bitrr can partially be replaced with rotate_bits_right https://docs.modular.com/mojo/stdlib/math/math.html#rotate_bits_right, partially because the shift value needs to be compile time constant, I see some of your arguments are known the others come from vector, I was not sure if it is possible to pre compute this vector at compile time The things I mentioned are rather stylestical things, I did not try if they will actually contribute to performance in a positive or negative way.

Modular Docs - math

Module

Modular Docs - unsafe

Module

Modular Docs - bit

Module

GitHub

mojo-fast-base64/fast_base64/chromium.mojo at main · mzaks/mojo-fas...

Contribute to mzaks/mojo-fast-base64 development by creating an account on GitHub.

Maxim•15mo ago

For reference, I ran your code without any changes on MacMini M1 I got megabytes per second 313.01040703457232 👍

bennyOP•15mo ago

can someone give me an example of just running it on a string like sha256(“mojo”)

brainiacoutcast•15mo ago

i got no idea how to cast the pointers right for that in mojo

ModularBot•15mo ago

Congrats @brainiacoutcast, you just advanced to level 4!

Maxim•15mo ago

I am currently working on compact-dict after I am done I can take a stab at sha256. @brainiacoutcast do you want to push it to some github repo? Is it ok if I will push it to mojo-hash? I am happy to reference you as initial creator. You can also create a PR in mojo-hash if you wish. https://github.com/mzaks/mojo-hash

GitHub

GitHub - mzaks/mojo-hash: A collection of hash functions implemente...

A collection of hash functions implemented in Mojo - mzaks/mojo-hash

bennyOP•15mo ago

what’s the type of the input right now?

brainiacoutcast•15mo ago

Buffer[_, DType.uint8, 0]

bennyOP•15mo ago

i’m thinking String._as_pointer().bitcastDType.uint8 probably works i’ll do some testing later today

brainiacoutcast•15mo ago

you can just use it unattributed

    let test_string = "mojo"
    let raw_ptr = test_string.data()
    let p = raw_ptr.bitcast[DType.uint8]()._as_scalar_pointer()
    let buffer = Buffer[Dim(), DType.uint8](p, test_string.__len__())
    let hash = sha256(buffer)

    let test_string = "mojo"
    let raw_ptr = test_string.data()
    let p = raw_ptr.bitcast[DType.uint8]()._as_scalar_pointer()
    let buffer = Buffer[Dim(), DType.uint8](p, test_string.__len__())
    let hash = sha256(buffer)

Maxim•14mo ago

I played around with the SHA256 implementation, it seems like it is not valid for longer streams.

brainiacoutcast•14mo ago

was that the later two or the very first one at the top of the channel? first ones broke

Gaming

Programming

SHA256

Did you find this page helpful?