How are Integers implemented in Mojo?

I experiment with lists of integers in python and mojo and noticed that mojo is way more memory efficient. Is this all done thanks to the static typing or are there other Optimistin methods in play?
8 Replies
ModularBot
ModularBot3mo ago
Congrats @fuckAllTechCompanies, you just advanced to level 1!
Darkmatter
Darkmatter3mo ago
CPython (the main python implementation), stores integers as an object with an arbitrary width (which is why it will tolerate numbers much larger than 2^64). The extra metadata needed for this has the downsize of meaning it has to be a heap allocation with a size and other metadata, along with the information for python to figure out that the allocation belongs to an integer and not a normal object. Mojo stores 4 or 8 bytes.
whatever
whateverOP3mo ago
Thanks for your reply. Maybe Oftopic here: Why experimenting with python and mojo integer lists i noticed that, the memory consumption I measure with top is not the same each run in Python.
from time import sleep

def test_large_list(size):
# Create a large list
large_list = [1 for _ in range(size)]
sleep(10)
return large_list

if __name__ == '__main__':
# Measure memory usage
size = 100_000_000
list = test_large_list(size)
print(list[10])
from time import sleep

def test_large_list(size):
# Create a large list
large_list = [1 for _ in range(size)]
sleep(10)
return large_list

if __name__ == '__main__':
# Measure memory usage
size = 100_000_000
list = test_large_list(size)
print(list[10])
` If I run it top shows me memory consumptions between 150 and 800 mb. The consumption of this mojo process is always arround 1800mb
from time import sleep

fn test_large_list(size: Int) -> List[Int]:
var large_list = List[Int](size)
for _ in range(size):
large_list.append(1)
return large_list

fn main():
var size: Int = 100_000_000
var large_list = test_large_list(size)
print("List created successfully")

sleep(15)
print(large_list[10])
from time import sleep

fn test_large_list(size: Int) -> List[Int]:
var large_list = List[Int](size)
for _ in range(size):
large_list.append(1)
return large_list

fn main():
var size: Int = 100_000_000
var large_list = test_large_list(size)
print("List created successfully")

sleep(15)
print(large_list[10])
` Why is the memory consumption of my python process that not always the same? Also why is python here that much more efficent? If I modify it and append i (from 0 to 99 999 999) to the list, instead of each time 1. Mojo still uses 1.8gb, but python 3.12 uses 3,2gb
Darkmatter
Darkmatter3mo ago
Mojo has a bunch of stuff built into the binary, like a chunk of the Mojo compiler, an async runtime, and a few other things, which increase the initial size, but that is a fixed price you pay for Mojo (and one we will probably look at getting rid of later). Python's higher memory usage starts to show up as you do more things, and also it having a ton of stuff that is hidden from you like a garbage collector. Mojo's List is actually closer to np.array than to Python's List, which is a good thing from an efficiency perspective.
whatever
whateverOP3mo ago
Can you send me a link to the mojo list implementation? I read that python uses interning, if the same value is used. So it only creates it once and than uses references to that. However I did not find a good explanation yet. Most say for integers its just in the range of -5 till 256, but this is not true. I meassure the same memory consumption regardless of using 1 as integer value or 1000000. Seems like Cpython uses interning also for higher integer values
Darkmatter
Darkmatter3mo ago
GitHub
mojo/stdlib/src/collections/list.mojo at nightly · modularml/mojo
The Mojo Programming Language. Contribute to modularml/mojo development by creating an account on GitHub.
Darkmatter
Darkmatter3mo ago
I have no idea how python manages interning.
whatever
whateverOP3mo ago
I find out that python only stores a single int object and has references on this in the list, if you use it several time.
a = [100000000 for _ in range(100)]
print(all(a[0] is x for x in a))
a = [100000000 for _ in range(100)]
print(all(a[0] is x for x in a))
` But I dont found a sorce explaining it yet

Did you find this page helpful?