How do I optimize memory usage for a neural network running on an ARM Cortex-M4 using CMSIS-NN?

@Middleware & OS How do I optimize memory usage for a neural network running on an ARM Cortex-M4 using CMSIS-NN? My current model runs out of memory. Here's my code:
#include "arm_nnfunctions.h"

void run_nn(const q7_t* input_data) {
q7_t intermediate_buffer[INTERMEDIATE_SIZE];
q7_t output_data[OUTPUT_SIZE];
// Run the network layers
arm_convolve_HWC_q7_basic(input_data, CONV1_WEIGHT, CONV1_BIAS, intermediate_buffer
#include "arm_nnfunctions.h"

void run_nn(const q7_t* input_data) {
q7_t intermediate_buffer[INTERMEDIATE_SIZE];
q7_t output_data[OUTPUT_SIZE];
// Run the network layers
arm_convolve_HWC_q7_basic(input_data, CONV1_WEIGHT, CONV1_BIAS, intermediate_buffer
Solution:
Did you consider reuse buffers for intermediate and output data
Jump to solution
4 Replies
Solution
wafa_ath
wafa_ath•2w ago
Did you consider reuse buffers for intermediate and output data
Enthernet Code
Enthernet Code•2w ago
No but I think I'll try it out, they're temporary right?
wafa_ath
wafa_ath•2w ago
Yes, exactly. you can save memory since they are only needed temporarily during computation.
Enthernet Code
Enthernet Code•2w ago
Thanks, this was helpful I tried it out 👇
#include "arm_nnfunctions.h"

#define INTERMEDIATE_SIZE 1024
#define OUTPUT_SIZE 512
#define MAX_BUFFER_SIZE ((INTERMEDIATE_SIZE > OUTPUT_SIZE) ? INTERMEDIATE_SIZE : OUTPUT_SIZE)

void run_nn(const q7_t* input_data) {
// Use a single buffer for both intermediate and output data
q7_t shared_buffer[MAX_BUFFER_SIZE];

// Run the network layers using the shared buffer
arm_convolve_HWC_q7_basic(input_data, CONV1_WEIGHT, CONV1_BIAS, shared_buffer);

// Continue with other layers, reusing the shared buffer
// For example:
// arm_fully_connected_q7(shared_buffer, FC1_WEIGHT, FC1_BIAS, shared_buffer);

// Copy final output to the output data buffer if needed
q7_t output_data[OUTPUT_SIZE];
memcpy(output_data, shared_buffer, OUTPUT_SIZE * sizeof(q7_t));
}
#include "arm_nnfunctions.h"

#define INTERMEDIATE_SIZE 1024
#define OUTPUT_SIZE 512
#define MAX_BUFFER_SIZE ((INTERMEDIATE_SIZE > OUTPUT_SIZE) ? INTERMEDIATE_SIZE : OUTPUT_SIZE)

void run_nn(const q7_t* input_data) {
// Use a single buffer for both intermediate and output data
q7_t shared_buffer[MAX_BUFFER_SIZE];

// Run the network layers using the shared buffer
arm_convolve_HWC_q7_basic(input_data, CONV1_WEIGHT, CONV1_BIAS, shared_buffer);

// Continue with other layers, reusing the shared buffer
// For example:
// arm_fully_connected_q7(shared_buffer, FC1_WEIGHT, FC1_BIAS, shared_buffer);

// Copy final output to the output data buffer if needed
q7_t output_data[OUTPUT_SIZE];
memcpy(output_data, shared_buffer, OUTPUT_SIZE * sizeof(q7_t));
}
Want results from more Discord servers?
Add your server
More Posts
How can I debug this communication issue to ensure ControlTask always reads the latest data?@Middleware & OS I'm working on a robotic arm controlled by FreeRTOS with two tasks; `SensorTask` whtrying to code a stm32 board and receiving this error how to fix it - Target no device foundi am trying to code a stm32 board and receiving this error how to fix it - Target no device found EIs it safe to cast the osThreadId pointer to a 32-bit integer for a unique thread ID across platformHey friends, Porting a product to a CMSIS RTOS. We need a 32-bit integer representing the thread IDWhy is power consumption higher in standby mode than stop mode on an STM32F411CEU6?Hey, anyone who might have a clue what's going on? Currently doing some energy-measurements on an STohk then , i will ask my dad to help weohk then , i will ask my dad to help we with few internshipsDIY ESP32-Based Smartwatch with LiDAR and Wi-Fi ScanningThis project features the ESP32 microcontroller, integrating advanced environmental monitoring and ITrying to establish SPI communication between two ArduinosI'm trying to establish SPI communication between two Arduinos. When I directly wire pins 10, 11, 12How can we strike a balance between security and performance in IoT devices?How can we strike a balance between security and performance in IoT devices, especially low-power onManaging Priorities in a CAN Bus Network with Arduino Uno and MCP2515 Moduleshello everyone, i have a project of realizing CAN bus, to achieve it i use two Arduinos uno and two How do I set and manage interrupt priorities in FreeRTOS?@Middleware & OS How do I set and manage interrupt priorities in FreeRTOS? My higher priority inter