M
Modular•6mo ago
Sia

Energy usage benchmarks?

I've read a lot about Mojo performance benchmarks but not that much about energy consumption? Are there any good articles where one could derive roughly the savings on Energy?
19 Replies
Darkmatter
Darkmatter•6mo ago
Savings compared to what? If you compare pure python to Mojo it's almost a spite match. If you are comparing Mojo to other systems languages (C/C++/Rust), then it's a lot less clear and is more of a question of "how much effort did the programmer put in?" Since I see you're also a distributed systems person, I'm pushing for more efficient network APIs that should lets you maximize the use of a single server, so while it's probably more expensive if you have a single 400 watt server (due to polling and similar optimizations), at datacenter scale you need less servers to do the same work.
Sia
SiaOP•6mo ago
@Owen Hilyard Sorry, let me clarify with an example. Let's say you got a piece of Python code, and port it to Mojo, is there an understanding of the energy savings? For example doing a very large matrix multiplication..
Darkmatter
Darkmatter•6mo ago
Code dependent. If you are comparing pure python, not numpy, pytorch, etc, to Mojo, it's like saying "Let's race a raspberry pi and a quad socket server for raw computational power". It's such a big difference in compute usage that it's hard to compare. For example, other languages around the same performance class as Mojo can do web apps with tens of millions of requests per second on a single server, how many servers would you need to do that in python?
Sia
SiaOP•6mo ago
a) The energy used in the end should be an absolute number, or am I wrong? b) Not thinking servers or network io here - just pure algorithmic, based on utilising more efficient instruction sets. --- Let's step back. I expect a piece of Python, ported to Mojo, to run more efficiently, thereby not just faster but also lower energy consumption. Essentially, I expect Mojo to do to normal Python code, what simdjson does to json-parsing: https://github.com/simdjson/simdjson So, my question is, are there any benchmarks out there? I do understand it will vary depending on the task at hand, and factors such as where the bottle-necks will be (network io, etc). But still, reading some benchmarks would help me form an understanding.
GitHub
GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second : ...
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks - simdjson/simdjson
ModularBot
ModularBot•6mo ago
Congrats @Sia, you just advanced to level 1!
Darkmatter
Darkmatter•6mo ago
The language is still in flux, but generally is very performance focused right now. We're in the progress of integrating a 10x performance increase over the old mojo algorithm for UTF-8 validation. Any benchmark you take right now will probably not be valid in 3 months since we keep pushing performance upwards.
Sia
SiaOP•6mo ago
@Owen Hilyard How does one get involved? 😛
Darkmatter
Darkmatter•6mo ago
Pick up a pet issue and either make a library or submit a pr to the standard library. Some stuff is only available to Modular employees since they are keeping some stuff closed for now to prevent a "too many cooks in the kitchen" issue.
Sia
SiaOP•6mo ago
@Owen Hilyard Can I make a wish? That next benchmark article also, if possible, looks into the energy savings? Actually, if you'd be interested, I could collaborate to profile that...
Darkmatter
Darkmatter•6mo ago
As I said, we keep pushing performance upwards so it's hard to measure. The original "python and JS are bad for energy usage, C++ and Rust are good" paper has been widely criticized for its methodology. There's too much that is highly specific to what algorithms you use and what hardware you have. For instance, if you don't have enough work doing polling for IO is wasteful, but if you do interrupts are wasteful. A throughput optimized vs latency optimized implementation of an algorithm. Or simply different ways to compile python. Since I've seen the way you compile the runtime swing energy usage by 5-10% in some cases for Python. I'd say the best way to do it is to take your usecase, get out a multimeter (or use a server with a BMC that does power measurements), and test bother full throttle python and full throttle Mojo.
ModularBot
ModularBot•6mo ago
Congrats @Owen Hilyard, you just advanced to level 18!
Darkmatter
Darkmatter•6mo ago
Anything other than that will likely be invalid, especially if you are going to make an argument to someone to switch to Mojo based on energy usage. The last mojo marithon was matmul, so you can use the winner of that and then hand-roll pure python matmul since every library I know of does matmul in C or C++.
Sia
SiaOP•6mo ago
That is exactly what I intend to do... 😉
Darkmatter
Darkmatter•6mo ago
Have fun, I'll be interested to see the results.
Sia
SiaOP•6mo ago
I mean advocating for mojo based on energy consumption. It might look stupid now but from where I stand, analysing the energy required to keep all AI-workloads a float, 1-2 years from now it might actually be the better argument...
Darkmatter
Darkmatter•6mo ago
We need to actually have GPGPU and support for other misc accelerators working first. Models are getting more efficient too, Llama3-8b is competitive with the full size Llama2.
Sia
SiaOP•6mo ago
Doesn't that indicate less efficiency if Llama 3-8b is competitive with 70b Llama 2? Or am I misunderstanding?
Darkmatter
Darkmatter•6mo ago
No, it means a much smaller model that is easier to run inference on is getting the same results. Less work for the same result.
Sia
SiaOP•6mo ago
Ah, you meant the models performance - gotcha, and agreed.

Did you find this page helpful?