M
Modular2mo ago
Dasor

MNIST slower on MAX than on Pytorch

Hi everyone, I'm getting started with all this promising stuff you guys are developing so I took a look at the tutorial to get started with MAX Graph: https://docs.modular.com/max/tutorials/get-started-with-max-graph/ I have been playing with the code it provides and I measured the time the python MNIST implementation takes to do inference which in my CPU is around 1.3 seconds and I did the same with the Mojo/MAX implemetation which takes around 5.5 seconds. My question is: shouldn't MAX be faster or at least the same as python? I'm not sure if this being a toy example does not really show the potential speedups you can get with MAX or If I'm getting something wrong in terms of the goal of MAX. Thanks in advance!
Get started with MAX Graph | Modular Docs
Learn how to build a model graph with our Mojo API for inference with MAX Engine.
8 Replies
Darkmatter
Darkmatter2mo ago
How are you timing it? MAX does the equivalent of a pytorch compile, so it's possible you are measuring that.
Ehsan M. Kermani (Modular)
There's a new driver api and the old tensor implementation is slow because it's a value semantic tensor so copies a lot. Those contents will be updated. Please check out the the doc in the meantime https://docs.modular.com/max/api/python/driver and mojo api https://docs.modular.com/max/api/mojo/driver/
driver | Modular Docs
CPU {#max.driver.CPU}
driver | Modular Docs
APIs to interact with devices.
Ehsan M. Kermani (Modular)
Also please clarify how you measured time. Note that the model compilation time needs to be seperated out.
Darkmatter
Darkmatter2mo ago
Have you considered updating the blog post to add timers where they should go, and possibly a pytorch equivalent that uses torch.compile? I've seen a few of these comparisons go by and many are effectively timing the one-shot latency.
Dasor
DasorOP2mo ago
Thanks for the replies I will take a look at those docs. I'm measuring time like this just adding a start variable before the loop and checking after I don't know if that's the right approach, here is a code sample:


start = perf_counter()

for i in range(len(test_dataset)):
item = test_dataset[i]
image = item[0]
label = item[1]

preprocessed_image = preprocess(image)

output = model.execute("input0", preprocessed_image)
probs = output.get[DType.float32]("output0")

predicted = probs.argmax(axis=1)

label_ = Tensor[DType.index](TensorShape(1), int(label))
correct += int(predicted == label_)
total += 1

print("Inference time:", perf_counter() - start, "seconds")


start = perf_counter()

for i in range(len(test_dataset)):
item = test_dataset[i]
image = item[0]
label = item[1]

preprocessed_image = preprocess(image)

output = model.execute("input0", preprocessed_image)
probs = output.get[DType.float32]("output0")

predicted = probs.argmax(axis=1)

label_ = Tensor[DType.index](TensorShape(1), int(label))
correct += int(predicted == label_)
total += 1

print("Inference time:", perf_counter() - start, "seconds")
Darkmatter
Darkmatter2mo ago
It looks like you are probably running into the tensors being copied in earlier versions. Once we have custom ops back, you should be able to move the entire loop inside of MAX without too much effort, which should save quite a bit of time.
Dasor
DasorOP2mo ago
I'm checking the driver mojo API but I can't even run the code sample provided, do I need the nightly build? Nevermind I was just missing using the TensorShape as it says on the tensor module example
Darkmatter
Darkmatter2mo ago
Most things in Mojo are designed for nightly because it tends to have substantial usability improvements.

Did you find this page helpful?