Modular•5mo ago

MNIST slower on MAX than on Pytorch

Hi everyone, I'm getting started with all this promising stuff you guys are developing so I took a look at the tutorial to get started with MAX Graph: https://docs.modular.com/max/tutorials/get-started-with-max-graph/ I have been playing with the code it provides and I measured the time the python MNIST implementation takes to do inference which in my CPU is around 1.3 seconds and I did the same with the Mojo/MAX implemetation which takes around 5.5 seconds. My question is: shouldn't MAX be faster or at least the same as python? I'm not sure if this being a toy example does not really show the potential speedups you can get with MAX or If I'm getting something wrong in terms of the goal of MAX. Thanks in advance!

Get started with MAX Graph | Modular Docs

Learn how to build a model graph with our Mojo API for inference with MAX Engine.

8 Replies

Darkmatter•5mo ago

How are you timing it? MAX does the equivalent of a pytorch compile, so it's possible you are measuring that.

Ehsan M. Kermani (Modular)•5mo ago

There's a new driver api and the old tensor implementation is slow because it's a value semantic tensor so copies a lot. Those contents will be updated. Please check out the the doc in the meantime https://docs.modular.com/max/api/python/driver and mojo api https://docs.modular.com/max/api/mojo/driver/

driver | Modular Docs

CPU {#max.driver.CPU}

driver | Modular Docs

APIs to interact with devices.

Ehsan M. Kermani (Modular)•5mo ago

Also please clarify how you measured time. Note that the model compilation time needs to be seperated out.

Darkmatter•5mo ago

Have you considered updating the blog post to add timers where they should go, and possibly a pytorch equivalent that uses torch.compile? I've seen a few of these comparisons go by and many are effectively timing the one-shot latency.

DasorOP•5mo ago

Thanks for the replies I will take a look at those docs. I'm measuring time like this just adding a start variable before the loop and checking after I don't know if that's the right approach, here is a code sample:

 
    
    start = perf_counter()

    for i in range(len(test_dataset)):
        item = test_dataset[i]
        image = item[0]
        label = item[1]

        preprocessed_image = preprocess(image)

        output = model.execute("input0", preprocessed_image)
        probs = output.get[DType.float32]("output0")

        predicted = probs.argmax(axis=1)

        label_ = Tensor[DType.index](TensorShape(1), int(label))
        correct += int(predicted == label_)
        total += 1

    print("Inference time:", perf_counter() - start, "seconds")

 
    
    start = perf_counter()

    for i in range(len(test_dataset)):
        item = test_dataset[i]
        image = item[0]
        label = item[1]

        preprocessed_image = preprocess(image)

        output = model.execute("input0", preprocessed_image)
        probs = output.get[DType.float32]("output0")

        predicted = probs.argmax(axis=1)

        label_ = Tensor[DType.index](TensorShape(1), int(label))
        correct += int(predicted == label_)
        total += 1

    print("Inference time:", perf_counter() - start, "seconds")

Darkmatter•5mo ago

It looks like you are probably running into the tensors being copied in earlier versions. Once we have custom ops back, you should be able to move the entire loop inside of MAX without too much effort, which should save quite a bit of time.

DasorOP•5mo ago

I'm checking the driver mojo API but I can't even run the code sample provided, do I need the nightly build? Nevermind I was just missing using the TensorShape as it says on the tensor module example

Darkmatter•5mo ago

Most things in Mojo are designed for nightly because it tends to have substantial usability improvements.

Gaming

Programming

MNIST slower on MAX than on Pytorch

Did you find this page helpful?