Martin Dudek
Martin Dudek
MModular
Created by Martin Dudek on 6/17/2024 in #questions
Seeking Clarification on Current and Future Tensor Library Support in Mojo
I wonder if someone can clarify the current state and future direction of a Tensor library for Mojo. I understand that Tensor won't stay in the standard library and i think to understand the rationale behind it. We have also NuMojo, which looks very promising. It is currently based on Tensor but aims to have "Native array types" as a long-term goal. Not clear about the situation, I implemented my own vectorized yet simple Vector and Matrix structs for my KAN experiments. They work, my KAN implementaton outperforms the Python implementation i ported to Mojo (which is numpy based). However, I just found out these are extremely slow compared to torch.matmul, etc. I would greatly appreciate it if someone could clarify what is possible right now for Tensor-based applications and where Mojo is headed in that regard (let's say within this year). To be clear, I'm not demanding anything of course — just looking for some clarification on what is possible right now and where we are going as community. Also feedback how others handle the current situation would be great. Thx 🙏
26 replies
MModular
Created by Martin Dudek on 6/13/2024 in #community-showcase
KANs in Mojo - second attempt
This week I finally found time to dive into Kolmogorov–Arnold Networks again and give a Mojo implementation another shot. https://github.com/dorjeduck/kamo I decided to scrap my original source code and port A from-scratch implementation of Kolmogorov-Arnold Networks (KAN)…and MLP to Mojo. Turns out, this base made it much easier for me to grasp the topic, especially after all the confusion around the many derivatives involved in KANs. 😉 Right now, the implementation is just a learning project for me, and it doesn't have any particular wider benefits for the community I am afraid. I might improve on it to make it more competitive within the ever-growing ocean of KAN implementations, but it's not a top priority for me at the moment ...
3 replies
MModular
Created by Martin Dudek on 6/11/2024 in #questions
How to understand Mojo's compiler optimization capabilities?
This is something that’s been bugging me for a while. I am afraid there’s no clear answer, but I’m curious how you guys handle it. How can I figure out what optimizations the compiler is doing anyway instead of implementing them myself (vectorize etc) ? Implementing them myself is often easy with Mojo, but still it’s prone to errors, makes the code harder to read, and things like choosing simd_width in the code might be less optimal than letting the compiler decide based on the machine the code is running on.
To clarify what i mean with the last point, it seems that on Apple Sillicon
alias simd_width = 4 * simdwidthof[dtype]()
alias simd_width = 4 * simdwidthof[dtype]()
is the best choice but on other machines maybe
alias simd_width = 2 * simdwidthof[dtype]()
alias simd_width = 2 * simdwidthof[dtype]()
Who knows? I hope the compiler does depending on the machine it is compiling for. It feels a bit insane to actually hardcode this factor. 😉 Thanks
8 replies
MModular
Created by Martin Dudek on 6/7/2024 in #questions
Best way to make a struct of CollectionElements to conform to CollectionElement.
I am still confused about best practices for __moveinit__ and __copyinit__. Let's say i have a struct with two variables, which are both structs which conform to the CollectionElement trait, how to define __moveinit__ and __copyinit__ for this struct. Will this do?
struct A:
var b:B # B conforms to CollectionElement
var c:C # C conforms to CollectionElement
fn __init__(...

fn __copyinit__(inout self, existing: Self):
# ????
self.b = existing.b
self.c = existing.c

fn __moveinit__(inout self, owned existing: Self):
# ???
self.b = existing.b
self.c = existing.c
struct A:
var b:B # B conforms to CollectionElement
var c:C # C conforms to CollectionElement
fn __init__(...

fn __copyinit__(inout self, existing: Self):
# ????
self.b = existing.b
self.c = existing.c

fn __moveinit__(inout self, owned existing: Self):
# ???
self.b = existing.b
self.c = existing.c
Any advice highly appreciated
6 replies
MModular
Created by Martin Dudek on 5/31/2024 in #questions
Seeking Advice on Porting Python Classes to Mojo Structs
I still struggle with how to port Python classes to Mojo structs in the current state of Mojo and seek some help. Here I post a Python class and suggest four ways to port it to Mojo that I came up with. All of them work in this simple example but seem to have some major drawbacks when the class is more complex. It would be great to get feedback on how you handle this. Here an example Python class to port to Mojo
class Animal:
def __init__(self, name):
self.name = name
def whoareyou(self):
return f"I am {self.name}!"
def speak(self):
raise NotImplementedError("Subclass must implement abstract method")

class Dog(Animal):
def speak(self):
return f"{self.name} says Woof!"

class Cat(Animal):
def speak(self):
return f"{self.name} says Meow!"

dog = Dog("Buddy")
cat = Cat("Whiskers")

print(dog.whoareyou()) # Output: I am Buddy!
print(cat.whoareyou()) # Output: I am Whiskers!
print(dog.speak()) # Output: Buddy says Woof!
print(cat.speak()) # Output: Whiskers says Meow!
class Animal:
def __init__(self, name):
self.name = name
def whoareyou(self):
return f"I am {self.name}!"
def speak(self):
raise NotImplementedError("Subclass must implement abstract method")

class Dog(Animal):
def speak(self):
return f"{self.name} says Woof!"

class Cat(Animal):
def speak(self):
return f"{self.name} says Meow!"

dog = Dog("Buddy")
cat = Cat("Whiskers")

print(dog.whoareyou()) # Output: I am Buddy!
print(cat.whoareyou()) # Output: I am Whiskers!
print(dog.speak()) # Output: Buddy says Woof!
print(cat.speak()) # Output: Whiskers says Meow!
11 replies
MModular
Created by Martin Dudek on 5/27/2024 in #community-showcase
Basic Progress Bar for Mojo
Based on a discussion we had here on Discord, I implemented a basic progress bar in Mojo and put it on Github https://github.com/dorjeduck/mopro Just when i published it, @Ryulord pointed out another progress bar implementation to me, which he is about to introduce here. His implementation looks for sure more advanced, so look out for it (i leave it to him to post the link here .when he is ready) Nevertheless, I wiill keep my implementation on github for the time being, may it benefit.
8 replies
MModular
Created by Martin Dudek on 5/27/2024 in #questions
tqdm progress bar for Mojo?
I cant get tqdm to work via Python interface , so i am wondering if any of you managed to use it in Mojo or if we have something similar already implemented in Mojo. Thx.
18 replies
MModular
Created by Martin Dudek on 5/26/2024 in #questions
Why are you using Mojo?
Just curious: why are you using Mojo? i will start with the obvious, lets see where this goes 😉
23 replies
MModular
Created by Martin Dudek on 5/23/2024 in #questions
Roadmap for classes, dynamic polymorphism ...
I noticed that object-oriented programming features like classes and dynamic polymorphins weren't addressed in the last community meeting. Given that Mojo development is tightly coupled with the Max Engine and its assumingly different requirements, I'm wondering if there is a long-term roadmap available for these features. To put it brief, will Mojo have classes in 2024? 😉 Thx
8 replies
MModular
Created by Martin Dudek on 5/18/2024 in #questions
How to implement Dependency Injection in Mojo?
The following doesnt work (yet) in Mojo it seems
trait Printer:
fn print_it(self,text:String):
...
@value
struct BoringPrinter(Printer):
fn print_it(self,text:String):
print(text)

fn lets_print(p:Printer,text:String):
p.print_it(text)

fn main():
var bp = BoringPrinter()
lets_print(bp,"let's sing a song")
trait Printer:
fn print_it(self,text:String):
...
@value
struct BoringPrinter(Printer):
fn print_it(self,text:String):
print(text)

fn lets_print(p:Printer,text:String):
p.print_it(text)

fn main():
var bp = BoringPrinter()
lets_print(bp,"let's sing a song")
i get the following error:
error: invalid call to 'lets_print': argument #0 cannot be converted from 'BoringPrinter' to 'Printer'
error: invalid call to 'lets_print': argument #0 cannot be converted from 'BoringPrinter' to 'Printer'
Is there a way to implement in Mojo what i am trying to do here. As we don't have inheritence yet i want to use Dependency Injection but without this i feel loooooost. Any advice on this highly appreciated. Not knowing how to implement this discourages me to think of more interesting framework like Mojo projects unfortunately.
10 replies
MModular
Created by Martin Dudek on 5/16/2024 in #questions
Python integration and performance
I am looking into the performance of the Python Integration in Mojo. I use Dict here as example but that is just random, my question is not about a `Dict but in general The following python program measures
time: 0.585089921951294 sec
time: 0.585089921951294 sec
on my computer to fill and modify a dictionary. as follows
import time
NUM = 1_000_000
start = time.time()
dic = {}
for i in range(NUM):
dic[str(i*2)] = i%3
for i in range(NUM):
dic[str(i*2)] *= 2
elapsed = (time.time()-start)
print("time:",elapsed,"sec")
import time
NUM = 1_000_000
start = time.time()
dic = {}
for i in range(NUM):
dic[str(i*2)] = i%3
for i in range(NUM):
dic[str(i*2)] *= 2
elapsed = (time.time()-start)
print("time:",elapsed,"sec")
When I include the dict into Mojo the performance drops significantly
time: 15.87300 sec
time: 15.87300 sec
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
var start = now()
var dict = Python.dict()
for i in range(NUM):
dict[str(i*2)] = i%3
for i in range(NUM):
dict[str(i*2)] *=2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
var start = now()
var dict = Python.dict()
for i in range(NUM):
dict[str(i*2)] = i%3
for i in range(NUM):
dict[str(i*2)] *=2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
Now when i shift the first loop into a python program
def get_dict(num):
dict = {}
for i in range(num):
dict[str(i*2)] = i%3
return dict
def get_dict(num):
dict = {}
for i in range(num):
dict[str(i*2)] = i%3
return dict
` and use this in Mojo as follows:
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
start = now()
Python.add_to_path("./utils")
var utils: PythonObject = Python.import_module("utils")
var dict = utils.get_dict(NUM)
for i in range(NUM):
dict[str(i*2)] *= 2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
start = now()
Python.add_to_path("./utils")
var utils: PythonObject = Python.import_module("utils")
var dict = utils.get_dict(NUM)
for i in range(NUM):
dict[str(i*2)] *= 2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
i get
time: 11.133032 sec
time: 11.133032 sec
which is 1,5 times faster. What I am mainly wondering about now are the last 2 examples. If performance is crucial, is it in certain cases when we need to rely on Python Integration advisable to perform some calculations directly in Python instead of just importing the Python object to Mojo, It feels odd but here it brings speedup. Thanks for any thoughts on that.
10 replies
MModular
Created by Martin Dudek on 5/14/2024 in #questions
Regular Expression Engine for Mojo, any plans?
Do we have a regular expression engine implemented in Mojo?I haven't come across one yet, but it would be fantastic to have. Or is there a native solution planned for Mojo? If not, what could be our approach to developing one? I have no clue about the complexity involved in this task. Would we base it on some existing C engine or write it from scratch in Mojo. Thx
87 replies
MModular
Created by Martin Dudek on 5/13/2024 in #community-showcase
Mojo port of Andrjey Karpathy's minbpe
https://github.com/dorjeduck/minbpe.mojo Minbpe is an implementation of various Tokenizers as they are commonly used in LLM application. This Mojo port is work in progress, particular looking into performance optimizations. One interesting aspect of this project for me is that there is a Rust port of minbpe and in some aspects of Tokenization this Mojo port is not yet a performant as the Rust implemention (which seems to be done very well). Not so much a competition with Rust for me, but extremely helpful to have this performance comparison to find aspect of my implemention which can be optimized ... Any feedback most welcome
2 replies
MModular
Created by Martin Dudek on 5/11/2024 in #community-showcase
MoString - fast String concatenation for Mojo
https://github.com/dorjeduck/mostring MoString is a simple yet fast String concatenation struct in Mojo. The idea behind the repo is also a bit of a community experiment, wondering if people are interested in contribtung their own implementation of this task to the repo so we can explore options which might flow into the Mojo Standard at one point. Very happy to accept PRs for that.
2 replies
MModular
Created by Martin Dudek on 5/10/2024 in #questions
What UnsafePointer can point to and allocate mem for?
I opened an issue last week in which I reported a problem with allocating for an array of Sets https://github.com/modularml/mojo/issues/2503 This issue was closed as not planned'which leaves me confused right now,. From the docu it says
UnsafePointer is a pointer type that can point to any generic value that is movable.
and Set in this example seems to be an appropriate object to point to.
As usual, I guess there is a fundamental misunderstanding here on my side and I wonder if anybody can explain to me, what Unsafepointer can point to and allocate memory for. The docu says
T (AnyType): The type the pointer points to.
but Set comfirms with AnyType of course ... Thx
15 replies
MModular
Created by Martin Dudek on 4/25/2024 in #questions
How to deal with the lack of struct inheritance in Mojo
As we don't have struct inheritence in Mojo yet, is the use of Dependency Injection the way to go? Or what other approaches should i consider. I know this is a very general question but i am actual looking for some general advice here. I try to port a python project to Mojo, but as this code makes use of (class) inheritance i need figure out how to approach this - or if i better wait for Mojo to introduce inheritence ... Thx
12 replies
MModular
Created by Martin Dudek on 4/17/2024 in #questions
vectorize changes the result of float operations
I just discovered that vectorize can change the result of floating operations. As @Maxim rightly pointed out to me, with floating point operations you can not expect (a+b)+c = a + (b+c)
from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

var v = DTypePointer[dtype]().alloc(NUM)

for i in range(NUM):
v[i] = i*0.2932

fn f1() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[i]
return val

fn f2() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+j]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f3() -> Float32:
var val:Float32 = 0.0

@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+width-j-1]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f4() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[NUM-i-1]
return val

fn f5() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
val += v.load[width=width](iv).reduce_add[1]()
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

print("f1:",f1())
print("f2:",f2(),"\n")
print("f3:",f3(),"\n")
print("f4:",f4())
print("f5:",f5())
from algorithm import vectorize

alias dtype=DType.float32
alias SIMD_WIDTH = 2*simdwidthof[dtype]()

alias NUM = 32
fn main():

var v = DTypePointer[dtype]().alloc(NUM)

for i in range(NUM):
v[i] = i*0.2932

fn f1() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[i]
return val

fn f2() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+j]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f3() -> Float32:
var val:Float32 = 0.0

@parameter
fn _op[width: Int](iv: Int):
for j in range(width):
val += v[iv+width-j-1]
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

fn f4() -> Float32:
var val:Float32 = 0.0
for i in range(NUM):
val += v[NUM-i-1]
return val

fn f5() -> Float32:
var val:Float32 = 0.0
@parameter
fn _op[width: Int](iv: Int):
val += v.load[width=width](iv).reduce_add[1]()
vectorize[_op, SIMD_WIDTH](size=NUM)

return val

print("f1:",f1())
print("f2:",f2(),"\n")
print("f3:",f3(),"\n")
print("f4:",f4())
print("f5:",f5())
output:
f1: 145.42720031738281
f2: 145.42720031738281

f3: 145.42721557617188

f4: 145.42718505859375
f5: 145.42718505859375
f1: 145.42720031738281
f2: 145.42720031738281

f3: 145.42721557617188

f4: 145.42718505859375
f5: 145.42718505859375
so we have three different results of operations which theoretically should get the same result. Now I wonder how to deal with this, these small drifts can have significant effects of course. (in my case with llm.mojo, it produces different texts than llm.c )
7 replies
MModular
Created by Martin Dudek on 4/16/2024 in #questions
splitting vectorize and parallelize
My first approach when optimizing a single loop is to apply vectorize. Now I wonder if it in some cases makes sense to transform the single loop into a nested loop, vectorizing the inner loop and parallelize the outer Instead of vectorizing
for i in range(12):
...
for i in range(12):
...
` using
for k in range(4):
for j in range(3):
var i = 3*k + j
...
for k in range(4):
for j in range(3):
var i = 3*k + j
...
and then vectorize over j and parallize over k. I f it makes sense, how to find a good balance between vectorize and parallize. In my concrete example, i have a loop of around 120 million .... (updating parameters in llm.mojo) What i also wonder in this regard if the compiler is detecting these optimizations anyway so better to keep the code simple and let the compiler do these type of standard optimization. Thanks
11 replies