Python integration and performance

I am looking into the performance of the Python Integration in Mojo. I use Dict here as example but that is just random, my question is not about a `Dict but in general The following python program measures
time: 0.585089921951294 sec
time: 0.585089921951294 sec
on my computer to fill and modify a dictionary. as follows
import time
NUM = 1_000_000
start = time.time()
dic = {}
for i in range(NUM):
dic[str(i*2)] = i%3
for i in range(NUM):
dic[str(i*2)] *= 2
elapsed = (time.time()-start)
print("time:",elapsed,"sec")
import time
NUM = 1_000_000
start = time.time()
dic = {}
for i in range(NUM):
dic[str(i*2)] = i%3
for i in range(NUM):
dic[str(i*2)] *= 2
elapsed = (time.time()-start)
print("time:",elapsed,"sec")
When I include the dict into Mojo the performance drops significantly
time: 15.87300 sec
time: 15.87300 sec
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
var start = now()
var dict = Python.dict()
for i in range(NUM):
dict[str(i*2)] = i%3
for i in range(NUM):
dict[str(i*2)] *=2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
var start = now()
var dict = Python.dict()
for i in range(NUM):
dict[str(i*2)] = i%3
for i in range(NUM):
dict[str(i*2)] *=2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
Now when i shift the first loop into a python program
def get_dict(num):
dict = {}
for i in range(num):
dict[str(i*2)] = i%3
return dict
def get_dict(num):
dict = {}
for i in range(num):
dict[str(i*2)] = i%3
return dict
` and use this in Mojo as follows:
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
start = now()
Python.add_to_path("./utils")
var utils: PythonObject = Python.import_module("utils")
var dict = utils.get_dict(NUM)
for i in range(NUM):
dict[str(i*2)] *= 2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
from python import Python
from time import now
alias NUM = 1_000_000
fn main() raises:
start = now()
Python.add_to_path("./utils")
var utils: PythonObject = Python.import_module("utils")
var dict = utils.get_dict(NUM)
for i in range(NUM):
dict[str(i*2)] *= 2
var elapsed = (now()-start)/1_000_000_000
print("time:",elapsed,"sec")
_ = dict["112"]
i get
time: 11.133032 sec
time: 11.133032 sec
which is 1,5 times faster. What I am mainly wondering about now are the last 2 examples. If performance is crucial, is it in certain cases when we need to rely on Python Integration advisable to perform some calculations directly in Python instead of just importing the Python object to Mojo, It feels odd but here it brings speedup. Thanks for any thoughts on that.
7 Replies
Aziz
Aziz7mo ago
In the code that runs in 15.87 sec, why do you have dict2 instead of dict as in the initial Python version? It is also not initialized as
var dict2 = Python.dict()
var dict2 = Python.dict()
Martin Dudek
Martin DudekOP7mo ago
sorry for that, corrected it. i have various dict implementation running here in one program and just extraced the code wrongly.
ModularBot
ModularBot7mo ago
Congrats @Martin Dudek, you just advanced to level 11!
roboquant
roboquant7mo ago
If indeed you often invoke CPython functionality in a loop, there will be indeed considerable overhead crossing the border from Mojo to CPython. Mind you, there are more mature languages where the overhead is much larger. And I can imagine that Mojo still can improve in this area.
vrtnis
vrtnis7mo ago
wondering if, in this example, we can minimize the number of calls between mojo and python. perhaps batch operations together in python as much as possible before passing the result to mojo also, just building on the comment above, for now it seems like we'd have to handle such issues on a case-by-case basis rather than a general approach. here it could be something like: utils.py def get_dict_and_modify(num): di = {} for i in range(num): di[str(i*2)] = i%3 for key in dic.keys(): di[key] *= 2 return di and then from python import Python from time import now alias NUM = 1_000_000 fn main() raises: start = now() Python.add_to_path("./utils") var utils: PythonObject = Python.import_module("utils") var dict = utils.get_dict_and_modify(NUM) var elapsed = (now()-start)/1_000_000_000 print("time:", elapsed, "sec") _ = dict["112"]
Martin Dudek
Martin DudekOP7mo ago
The example is just to illustrate the difference in performance when doing some of the operation in python. In a regular program we wouldn't loop twice here in the first place (we could combine mod and * of course)
Sorry for being unclear on that .
vrtnis
vrtnis7mo ago
ohh got it..makes sense!
Want results from more Discord servers?
Add your server