Basalt: ML Framework
GitHub
GitHub - basalt-org/basalt: A Machine Learning framework from scrat...
A Machine Learning framework from scratch in Pure Mojo 🔥 - basalt-org/basalt
22 Replies
by @benny, @Stijn, @NKspartan, and @fnands
Well, mostly the first three with some small contributions by me 🙃
making steady progress towards another release, we are always looking for more people interested in helping move the project along :)
Great. Working on implementing some activation functions (draft PR open). Are there any I should prioritize?
Also, is there currently a place to write the tests for the backwards passes of the activation functions?
Awesome @Yosi Frost
I dont think we have any priorities with specific activation functions, whatever you can think of works. Tests for activation fuctions should go in /tests/mojo/test_activations.mojo
Great! Is tests/test_activations.mojo just for forwards or does it also include tests for the backwards pass?
it should also include backwards tests
Ok. I must have missed them. Will write those tests as well. Thank you!
I'm interested in dataloading and made a comment in https://github.com/basalt-org/basalt/issues/90#issuecomment-2127550998. I agree with what as posted on one of the other channels about not licking the cookie though. I'm curious what frameworks people have used for dataloading / what they like / dont like.
To me I think mojo still needs Iterable / Gettable traits to make transforms/pipes possible to even prototype.
you could easily make both of those traits now with current Mojo, im not sure I fully understand the question
I think there are still features needed (?). I saw this thread: https://discord.com/channels/1087530497313357884/1224434323193594059/1238338296699158598 which gave me the impression its not possible yet
while that is correct, because of how Basalt works right now, dtype is accessible for any module, so you don’t need a generic trait to return a Scalar[dtype]
but i’m not sure if this would change for your use case
Out of curiosity, is the source code for basalt based on any other frameworks or whitepapers or is it just from first principles? I noticed it has vague similarities to tinygrad but not enough to be recognizable as a port
There are definitely some influences from other frameworks, but it's also very much tailored to whatever Mojo allows. You'll should recognize stuff from pytorch, tinygrad, mlx. Those are probably the ones that are most looked at
Giving a talk at todays community meeting if anyone is interested, the video will be posted after :)
about that comparison: is that pytorch running on python or is it pytorch running on mojo?
I mean, if basalt is 2x slower than pytorch on a language 5-10 times faster at least, I'd avoid it for now.
(I know there is a lot of improvement coming, but still)
I assume PyTorch on Python, but that shouldn't dissuade you.
The vast majority of heavy lifting in PyTorch is offloaded to faster languages. It has had far longer to make optimizations, was originally made by Facebook, and has way more contributors.
Basalt is heavily disadvantaged in this comparison.
Pytorch on Python. If you want a stable framework use Pytorch, nobody is offended. I can promise you however that in 6-12 months your model will run almost exactly the same speed as it does today, I cannot say the same for Basalt
Isn't core Pytorch written in C and Fortran and optimised for more than a decade now?
No
PyTorch is a first-class bindings to libtorch, which is C++ library.
And it is kinda optimised, but bloated and not very great quality of code...
That's why there are some other solutions that outperform it
@benny , I am really impressed with the performance comparison between Basalt and PyTorch. Congratulations on the achievement!
I'm curious about how you guys managed to reach such impressive performance levels. I noticed that Basalt includes some advanced tensor routines. Are these routines, such as the matmul, equally performant to Pytorch's
torch.matmul
?
I'm asking because I just had the sort of hilarious insight that the simple vectorized matmul routine I wrote for my KAN experiments is around 100 times slower than torch.matmul.
🤯
ThxHey Martin, thanks 😊
It was kind of a joint effort between all the contributors, but it took probably 100 hours and a bunch of failed attempts, we used a bunch of old/new research papers and some novel ideas to try and block it more efficiently.
That being said there is rumored to be a ~3000 line kernel used in the MAX engine from the modular team that’s even faster (closed source + unrealistic for us to implement atm)
If you want more details your welcome to dm me and I can explain some of the nuances but most of the info can be found with a couple google scholar searches