M
Modular•7mo ago
Jack Clayton

Basalt: ML Framework

GitHub
GitHub - basalt-org/basalt: A Machine Learning framework from scrat...
A Machine Learning framework from scratch in Pure Mojo 🔥 - basalt-org/basalt
22 Replies
Jack Clayton
Jack ClaytonOP•7mo ago
by @benny, @Stijn, @NKspartan, and @fnands
Ferdinand Schenck
Ferdinand Schenck•7mo ago
Well, mostly the first three with some small contributions by me 🙃
benny
benny•7mo ago
making steady progress towards another release, we are always looking for more people interested in helping move the project along :)
Yosef Frost
Yosef Frost•7mo ago
Great. Working on implementing some activation functions (draft PR open). Are there any I should prioritize? Also, is there currently a place to write the tests for the backwards passes of the activation functions?
benny
benny•7mo ago
Awesome @Yosi Frost I dont think we have any priorities with specific activation functions, whatever you can think of works. Tests for activation fuctions should go in /tests/mojo/test_activations.mojo
Yosef Frost
Yosef Frost•7mo ago
Great! Is tests/test_activations.mojo just for forwards or does it also include tests for the backwards pass?
benny
benny•7mo ago
it should also include backwards tests
Yosef Frost
Yosef Frost•7mo ago
Ok. I must have missed them. Will write those tests as well. Thank you!
Josiah
Josiah•7mo ago
I'm interested in dataloading and made a comment in https://github.com/basalt-org/basalt/issues/90#issuecomment-2127550998. I agree with what as posted on one of the other channels about not licking the cookie though. I'm curious what frameworks people have used for dataloading / what they like / dont like. To me I think mojo still needs Iterable / Gettable traits to make transforms/pipes possible to even prototype.
benny
benny•7mo ago
you could easily make both of those traits now with current Mojo, im not sure I fully understand the question
Josiah
Josiah•7mo ago
I think there are still features needed (?). I saw this thread: https://discord.com/channels/1087530497313357884/1224434323193594059/1238338296699158598 which gave me the impression its not possible yet
benny
benny•7mo ago
while that is correct, because of how Basalt works right now, dtype is accessible for any module, so you don’t need a generic trait to return a Scalar[dtype] but i’m not sure if this would change for your use case
pedro
pedro•7mo ago
Out of curiosity, is the source code for basalt based on any other frameworks or whitepapers or is it just from first principles? I noticed it has vague similarities to tinygrad but not enough to be recognizable as a port
Stijn
Stijn•7mo ago
There are definitely some influences from other frameworks, but it's also very much tailored to whatever Mojo allows. You'll should recognize stuff from pytorch, tinygrad, mlx. Those are probably the ones that are most looked at
benny
benny•7mo ago
Giving a talk at todays community meeting if anyone is interested, the video will be posted after :)
MRiabov
MRiabov•7mo ago
about that comparison: is that pytorch running on python or is it pytorch running on mojo? I mean, if basalt is 2x slower than pytorch on a language 5-10 times faster at least, I'd avoid it for now. (I know there is a lot of improvement coming, but still)
The Professor
The Professor•6mo ago
I assume PyTorch on Python, but that shouldn't dissuade you. The vast majority of heavy lifting in PyTorch is offloaded to faster languages. It has had far longer to make optimizations, was originally made by Facebook, and has way more contributors. Basalt is heavily disadvantaged in this comparison.
benny
benny•6mo ago
Pytorch on Python. If you want a stable framework use Pytorch, nobody is offended. I can promise you however that in 6-12 months your model will run almost exactly the same speed as it does today, I cannot say the same for Basalt
Melody Daniel
Melody Daniel•6mo ago
Isn't core Pytorch written in C and Fortran and optimised for more than a decade now?
Serg Gini
Serg Gini•6mo ago
No PyTorch is a first-class bindings to libtorch, which is C++ library. And it is kinda optimised, but bloated and not very great quality of code... That's why there are some other solutions that outperform it
Martin Dudek
Martin Dudek•6mo ago
@benny , I am really impressed with the performance comparison between Basalt and PyTorch. Congratulations on the achievement! I'm curious about how you guys managed to reach such impressive performance levels. I noticed that Basalt includes some advanced tensor routines. Are these routines, such as the matmul, equally performant to Pytorch's torch.matmul? I'm asking because I just had the sort of hilarious insight that the simple vectorized matmul routine I wrote for my KAN experiments is around 100 times slower than torch.matmul. 🤯 Thx
benny
benny•6mo ago
Hey Martin, thanks 😊 It was kind of a joint effort between all the contributors, but it took probably 100 hours and a bunch of failed attempts, we used a bunch of old/new research papers and some novel ideas to try and block it more efficiently. That being said there is rumored to be a ~3000 line kernel used in the MAX engine from the modular team that’s even faster (closed source + unrealistic for us to implement atm) If you want more details your welcome to dm me and I can explain some of the nuances but most of the info can be found with a couple google scholar searches
Want results from more Discord servers?
Add your server