llm.mojo: GPT2 fine-tuning and inference in a single Mojo file
GitHub
GitHub - dorjeduck/llm.mojo: port of Andrjey Karpathy's llm.c to Mojo
port of Andrjey Karpathy's llm.c to Mojo. Contribute to dorjeduck/llm.mojo development by creating an account on GitHub.
15 Replies
by @Martin Dudek
Just updated it to 24.6 . A bit of a ride as it was on 24.4 but mostly straight forward
DTypePointer -> UnsafePointer
transitions and adding/changing of various imports.
I have no further plans with this project but nice to have at least updated to the latest Mojo version ...Really cool project Martin. A friend of mine is actually working on the original llm.c project with that Karpathy guy cool stuff
So uh did anyone take a notice @ Martin’s llm.mojo fork of the main llm.c project
Thanks @Robert - this is a 6 month old project, and actually mentioned on the Mojo language intro page https://www.modular.com/mojo , next to much cooler projects like Endia, Basalt and LightbugHTTP . Well the blessing of the name Andrej Karpathy got it there i guess 😂
Mojo 🔥: Programming language for all of AI
Mojo combines the usability of Python with the performance of C, unlocking unparalleled programmability of AI hardware and extensibility of AI models.
Porting from C to Mojo is actually - at least for me - much easier than porting from Python.
When Karpathy published llm.c i had time to make this port and he kindly early on added it to the notable ports section on the llm.c github page. He seems to be a really nice guy and I am big fan of the educational stuff he puts on youtube, so it was a pleasure for me to do this project.
I don’t doubt it. My internet bro (lol) is 1 of the 3 main developers on the llm.c repo
So you don’t plan on continuing this project? Can we fork the project and port it to something else. There is a ton of stuff going on in the Modular stack. Hope I can build or contribute to something in the stack half as cool
There are plenty of ports, did you see https://github.com/karpathy/llm.c?tab=readme-ov-file#notable-forks
Sure feel free to fork it and do what you want, it is there for the community to play around with. Following Andrej, I published it under the MIT license, that should also formally give you all the freedom you want.
After the first implementation, i did not really touch it much anymore, except to make sure it runs with new stable Mojo versions. I am sure the code could be refined, but for me its basically a proof of concept project.
If you want to port to another language, i would highly recommend to just go with the original C version ... if you are a Rust guy, that port looks very solid to me and its fast too. I havent looked at any of the other ports.
GitHub
GitHub - karpathy/llm.c: LLM training in simple, raw C/CUDA
LLM training in simple, raw C/CUDA. Contribute to karpathy/llm.c development by creating an account on GitHub.
@Martin Dudek I’m pretty keen on continuing your project for mojo. I don’t see much interest but I could work on it on the side
Great . Curious what you make out of it. I don't rule out myself that I might pick it up at one point again - after all it's my mojo one hit wonder 😀 - but as of now I don't really see any interesting way to improve it without diverting significantly from the idea of the original llm.c .. please drop me a line when you publish something
I am really in the ML/DL side of the space which is kind of from what I understand the llm.c project is about. But I will take a look at what you built out with the llm.mojo project and see how it goes. It is the holidays so maybe some time in the new year
@Martin Dudek Hey Martin, so I've been bored over the holidays. Check it out.
I checked it out and came to the conclusion, you didn't read
https://docs.modular.com/magic/
😉
It's 'magic run .. ' not 'mojo run ... ' or 'magic shell' first and then you are in an 'env' in which you can run the 'mojo' command.
Get started with Magic | Modular Docs
Magic is a package manager and virtual environment manager for any language,
@Martin Dudek Yo can I ping you? I understand you may be busy cause of the holidays
On a side note the model does work I just didn’t take any screenshots. I was pretty much just benchmarking it at the moment. I ran the test script as well. Just had questions
Sure feel free to ask questions i will reply whenever i find time. I probably won't feel like digging into all the details of this 6 month old project but sure i can help you with more fundamental questions about the Mojo aspect of it if you stuck. To understand llm.c , you said you have a buddy who is deeply involved in it, so you better ask him.