Manuel Comments - Answer Overflow

Manuel

•Created by Marvee Amasi on 7/31/2024 in #✅-code-review

How can I optimize matrix multiplication performance and reduce L3 cache misses in my C++ library?

you may also search the web for efficient matrix multiplication. this is a standard topic and therefore there should be enough articles how to optimize it

10 replies

DIIDevHeads IoT Integration Server

•Created by Marvee Amasi on 7/31/2024 in #✅-code-review

How can I optimize matrix multiplication performance and reduce L3 cache misses in my C++ library?

Regarding cache locality, try to access memory consecutively. So, if it is your first index, that is multiplied to get the memory position, then keep this fixed to prevent big steps in memory. In your case this would mean the inner most loop should iterate over j not over k, as k is used as first index for the access to other. So try:

for (int i = 0; i < rows_; ++i) {
    for (int j = 0; j < other.cols_; ++j) {
        result(i, j) = 0.0;
    }
    for (int k = 0; k < cols_; ++k) {
        for (int j = 0; j < other.cols_; ++j) {
            result(i, j) += (*this)(i, k) * other(k, j);
        }
    }
}

for (int i = 0; i < rows_; ++i) {
    for (int j = 0; j < other.cols_; ++j) {
        result(i, j) = 0.0;
    }
    for (int k = 0; k < cols_; ++k) {
        for (int j = 0; j < other.cols_; ++j) {
            result(i, j) += (*this)(i, k) * other(k, j);
        }
    }
}

This should boost performance quite a bit.

10 replies

DIIDevHeads IoT Integration Server

•Created by Marvee Amasi on 7/31/2024 in #✅-code-review

How can I optimize matrix multiplication performance and reduce L3 cache misses in my C++ library?

Where do you actually use SSE instructions? The code you provided does not contain any, as far as I see.

10 replies

DIIDevHeads IoT Integration Server

•Created by Manuel on 4/25/2024 in #🪲-firmware-and-baremetal

Has anyone experience with the integrated pcie block from the xilinx 7 series fpga's?

I just tried, and now it works again, without this additional delayed-reset-woodoo I copied from another project. The main culprit was, that I had a wrong assignment user_led(0 downto 1) <= some_signal(1 downto 0) that led to all kinds of strange behavior in unrelated parts of the project.

10 replies

DIIDevHeads IoT Integration Server

•Created by nour_oud on 3/3/2024 in #✅-code-review

Coding Challenge < Python >

heads = 35
legs = 94
headsPerChicken = 1
headsPerRabbit = 1
legsPerChicken = 2
legsPerRabbit = 4

import numpy as np
equations = np.array([[headsPerChicken, headsPerRabbit], [legsPerChicken, legsPerRabbit]])
solutions = np.array([heads, legs])
chickens, rabbits = np.linalg.solve(equations, solutions)

print(f'{rabbits:g}', "rabbits and", f'{chickens:g}', "chickens add up to", heads, "heads and", legs, "legs.")

heads = 35
legs = 94
headsPerChicken = 1
headsPerRabbit = 1
legsPerChicken = 2
legsPerRabbit = 4

import numpy as np
equations = np.array([[headsPerChicken, headsPerRabbit], [legsPerChicken, legsPerRabbit]])
solutions = np.array([heads, legs])
chickens, rabbits = np.linalg.solve(equations, solutions)

print(f'{rabbits:g}', "rabbits and", f'{chickens:g}', "chickens add up to", heads, "heads and", legs, "legs.")

4 replies

DIIDevHeads IoT Integration Server

•Created by nour_oud on 12/11/2023 in #🪲-firmware-and-baremetal

High-Speed ADC Interface Design ?

What is your desired sampling frequency?

3 replies

DIIDevHeads IoT Integration Server

•Created by Navadeep on 2/7/2024 in #devheads-feed

Innetra T1 - Neromorphic Chip with Spiking Neural Network Accelerator

Nice. I learned about neural networks around 20 years ago, when I was 18. I experimented, implemented them in C++ and played around with pattern recognition. Later I used them to make some virtual creatures learn to survive in a small simulated environment. At the time I thought about the idea of trying it with spiking neural networks, but found very few information about them. Then my interest faded, when the complexity of my projects lead to training times of several days and I had not the money to bay better hardware. But still I am excited whenever I realize how far this topic has come nowadays. 🙂

2 replies

DIIDevHeads IoT Integration Server

•Created by nour_oud on 1/21/2024 in #✅-code-review

Challenge of the day " C++"

Two to the power of three. 😉

3 replies

DIIDevHeads IoT Integration Server

•Created by Embedded Shiksha on 1/15/2024 in #📦-middleware-and-os

Starting Linux Kernel Series with this very basic question

Depends... Do you mean how the name Linux is used nowadays or for what it was used originally?

3 replies

DIIDevHeads IoT Integration Server

•Created by Chimmuanya Okere on 12/18/2023 in #🪲-firmware-and-baremetal

How is Flash memory different from ROM in microcontrollers?

@Chimmuanya Okere: Functionwise flash is in between RAM and ROM. Changing ROM is typically very slow (if possible at all) and limited to very few cycles, before it breaks. Therefore it is used for data that is not changed on a regular basis. Changing RAM is fast, as this is the main purpose of it. But it loses its data fast, too. So it has to be rewritten in short intervals, therefore can only be useful when this is possible (system is powered on). Flash can store data over extended periods without power, but is quite slower changing data, compared to RAM. With modern flash technology, data can be stored for years, it can withstand a lot of rewrite cycles and the price of it has come down significantly. Therefore it has replaced many typical uses cases of ROM nowadays.

9 replies

Gaming

Programming