GCC Firmware Analysis: Mastering Map Files and Memory Optimization Techniques

GCC Firmware Analysis Complete Guide: Mastering Map Files and Memory Optimization Techniques

In daily embedded development, optimizing firmware size and memory usage is a crucial skill. This article will detail how to use the GCC toolchain to analyze firmware usage, helping developers accurately grasp the usage of FLASH and RAM.

1. Why is Firmware Analysis Necessary?

In embedded system development, we often face the following challenges:

  • Insufficient FLASH space to add new features
  • Excessive RAM usage affecting system stability
  • Difficulty in locating performance bottlenecks
  • Lack of data support for code optimization

Through firmware analysis, we can:

  • Precisely understand memory usage
  • Identify optimization opportunities
  • Improve system performance
  • Reduce power consumption

2. Overview of the GCC Compilation Process

Compilation Stages

  1. Preprocessing – Handling macro definitions and header files
  2. Compilation – Converting C code to assembly
  3. Assembly – Generating object files (.o)
  4. Linking – Generating the final executable file

Key Compilation Options

# Optimization options
-O0  # No optimization, suitable for debugging
-Os  # Optimize for size, recommended for release
-O2  # Balanced optimization
-O3  # Highest optimization level

# Debug information
-g   # Generate debug information
-gdwarf-2  # DWARF-2 format debug information

3. Detailed Memory Layout

Typical Embedded Memory Layout

FLASH (0x08000000)
├── .text section    # Program code
├── .rodata section  # Read-only data
└── .data initialized values  # Stored in FLASH

RAM (0x20000000)
├── .data section    # Initialized global variables
├── .bss section     # Uninitialized global variables
├── Heap space       # Dynamic memory allocation
└── Stack space      # Function calls and local variables

Memory Usage Calculation Formulas

FLASH usage = text section + data section
RAM usage = data section + bss section
Total firmware size = text section + data section + bss section

4. In-Depth Analysis of Map Files

The map file is a valuable resource generated by the linker, containing the following key information:

1. Memory Configuration

Memory Configuration
Name Origin Length Attributes
FLASH 0x08000000 0x00020000 xr
RAM   0x20000000 0x00008000 xw

2. Section Layout Information

Linker script and memory map
.text 0x08000000 0x1234
 .text 0x08000000 0x100 main.o
 .text 0x08000100 0x134 startup.o

.data 0x20000000 0x200
 .data 0x20000000 0x200 global_var.o

.bss 0x20000200 0x400
 .bss 0x20000200 0x400 buffer.o

3. Symbol Table Analysis

Symbol Table
08020000 T Reset_Handler
08020124 T main
20000000 D global_counter
20001000 B rx_buffer

Symbol type descriptions:

  • T – Code section symbol
  • D – Initialized data
  • B – Uninitialized data
  • A – Absolute symbol

5. Practical Tools in the GCC Toolchain

1. size Command

View the size of each section in the target file:

# Basic usage
arm-none-eabi-size firmware.elf
# Output: text data bss dec hex

# Show detailed information of all sections
arm-none-eabi-size -A firmware.elf

# Display in hexadecimal
arm-none-eabi-size -x firmware.elf

2. nm Command

List symbol table information:

# Basic usage
arm-none-eabi-nm firmware.elf

# Sort by size
arm-none-eabi-nm --size-sort firmware.elf

# Show source file information
arm-none-eabi-nm -l firmware.elf

3. objdump Command

Disassemble and analyze target files:

# Disassemble code section
arm-none-eabi-objdump -d firmware.elf

# Mix source code and assembly
arm-none-eabi-objdump -S firmware.elf

# Show section header information
arm-none-eabi-objdump -h firmware.elf

6. Real Optimization Cases

Case 1: FLASH Space Optimization

Problem: Firmware size 132KB, exceeding the 128KB limit by 4KB

Solution:

  1. Enable -Os optimization option
# Modify compilation options in Makefile
CFLAGS += -Os
CXXFLAGS += -Os
  1. Remove unused library functions
// Use a lighter implementation
// Avoid using printf, use lightweight serial output instead
void uart_send_string(const char* str) {
    while (*str) {
        UART_SendChar(*str++);
    }
}

// Replace sprintf with simple number conversion
void int_to_string(int num, char* buffer) {
    int i = 0;
    int is_negative = 0;
    
    if (num < 0) {
        is_negative = 1;
        num = -num;
    }
    
    do {
        buffer[i++] = '0' + (num % 10);
        num /= 10;
    } while (num > 0);
    
    if (is_negative) {
        buffer[i++] = '-';
    }
    
    // Reverse the string
    for (int j = 0; j < i/2; j++) {
        char temp = buffer[j];
        buffer[j] = buffer[i-1-j];
        buffer[i-1-j] = temp;
    }
    buffer[i] = '\0';
}
  1. Optimize string storage methods
// Use const modifier to place strings in .rodata section
const char error_msg[] = "Error occurred";
const char success_msg[] = "Operation successful";

// Use string array instead of multiple independent strings
const char* const messages[] = {
    "OK",
    "Error",
    "Timeout",
    "Busy"
};
  1. Use smaller data types
// Before optimization: using int occupies 4 bytes
int status_flag = 0;
int counter = 0;

// After optimization: choose type based on actual needs
uint8_t status_flag = 0;  // 1 byte is sufficient
uint16_t counter = 0;     // 2 bytes is sufficient, range 0-65535

// Use bit fields to compress further
struct SystemStatus {
    uint8_t power_on : 1;
    uint8_t connected : 1;
    uint8_t error_flag : 1;
    uint8_t reserved : 5;
};

Effect: Firmware size reduced to 118KB, saving 10.6% space

Case 2: RAM Usage Optimization

Problem: Excessive RAM usage affecting system stability

Optimization Strategy:

  1. Reduce global buffer size
// Before optimization: overly large buffers
uint8_t rx_buffer[4096];
uint8_t tx_buffer[4096];

// After optimization: adjust size based on actual needs
#define BUFFER_SIZE 1024  // Adjust based on actual communication rate
uint8_t rx_buffer[BUFFER_SIZE];
uint8_t tx_buffer[BUFFER_SIZE];

// Or use circular buffer
#define CBUFFER_SIZE 256
typedef struct {
    uint8_t buffer[CBUFFER_SIZE];
    volatile uint16_t head;
    volatile uint16_t tail;
} CircularBuffer_t;
  1. Use dynamic memory allocation
// Before optimization: statically allocate maximum possible memory
#define MAX_CONNECTIONS 10
Connection_t connections[MAX_CONNECTIONS];

// After optimization: allocate dynamically as needed
Connection_t* create_connection(void) {
    Connection_t* conn = malloc(sizeof(Connection_t));
    if (conn) {
        memset(conn, 0, sizeof(Connection_t));
        conn->status = CONN_IDLE;
    }
    return conn;
}

void destroy_connection(Connection_t* conn) {
    if (conn) {
        free(conn);
    }
}
  1. Optimize data structure alignment
// Before optimization: alignment holes exist
struct SensorData {
    uint8_t id;        // 1 byte
    uint32_t value;    // 4 bytes (3 bytes hole before)
    uint8_t status;    // 1 byte
    // Total size: 12 bytes (4 bytes alignment)
};

// After optimization: arrange fields reasonably
struct SensorData {
    uint32_t value;    // 4 bytes
    uint8_t id;        // 1 byte
    uint8_t status;    // 1 byte
    uint8_t reserved[2]; // 2 bytes padding
    // Total size: 8 bytes (4 bytes alignment)
};
  1. Reuse temporary buffers
// Before optimization: multiple independent buffers
uint8_t calc_buffer[256];
uint8_t parse_buffer[256];
uint8_t format_buffer[256];

// After optimization: shared buffer
#define WORK_BUFFER_SIZE 256
static uint8_t work_buffer[WORK_BUFFER_SIZE];

void perform_calculation(void) {
    // Use work_buffer for calculations
    memset(work_buffer, 0, WORK_BUFFER_SIZE);
    // Calculation logic...
}

void parse_data(void) {
    // Reuse work_buffer for parsing
    memset(work_buffer, 0, WORK_BUFFER_SIZE);
    // Parsing logic...
}

Effect: RAM usage reduced from 53KB to 42KB, saving 34.4% space

Case 3: Performance Tuning

Optimization Measures:

  1. Use DMA to reduce CPU load
// DMA transfer configuration
void setup_dma_transfer(uint32_t src_addr, uint32_t dst_addr, uint32_t length) {
    DMA_InitTypeDef dma_init;
    
    dma_init.Direction = DMA_DIR_PeripheralSRC;
    dma_init.PeripheralInc = DMA_PINC_DISABLE;
    dma_init.MemoryInc = DMA_MINC_ENABLE;
    dma_init.PeripheralDataAlignment = DMA_PDATAALIGN_BYTE;
    dma_init.MemoryDataAlignment = DMA_MDATAALIGN_BYTE;
    dma_init.Mode = DMA_NORMAL;
    dma_init.Priority = DMA_PRIORITY_HIGH;
    dma_init.Channel = DMA_CHANNEL_0;
    
    DMA_Init(DMA1_Stream0, &amp;dma_init);
    DMA_Cmd(DMA1_Stream0, ENABLE);
}
  1. Replace floating-point numbers with fixed-point numbers
// Before optimization: using floating-point numbers
float temperature = 36.6;
float setpoint = 40.0;
float error = temperature - setpoint;
float output = error * 2.5;

// After optimization: using fixed-point numbers
#define FIXED_POINT_SCALE 100
int16_t temperature = 3660;  // 36.60 * 100
int16_t setpoint = 4000;     // 40.00 * 100
int16_t error = temperature - setpoint;
int16_t output = (error * 250) / 100;  // 2.5 * 100

// Fixed-point multiplication function
int16_t fixed_mul(int16_t a, int16_t b) {
    return ((int32_t)a * b) / FIXED_POINT_SCALE;
}
  1. Optimize algorithm complexity
// Before optimization: O(n²) bubble sort
void bubble_sort(int arr[], int n) {
    for (int i = 0; i < n-1; i++) {
        for (int j = 0; j < n-i-1; j++) {
            if (arr[j] > arr[j+1]) {
                swap(&amp;arr[j], &amp;arr[j+1]);
            }
        }
    }
}

// After optimization: O(n log n) quick sort
void quick_sort(int arr[], int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high);
        quick_sort(arr, low, pi - 1);
        quick_sort(arr, pi + 1, high);
    }
}
  1. Enable hardware acceleration features
// Use hardware CRC calculation
uint32_t calculate_crc_hw(uint8_t* data, uint32_t length) {
    CRC_ResetDR();
    return CRC_CalcBlockCRC((uint32_t*)data, length / 4);
}

// Use hardware multiplier
int32_t hardware_multiply(int32_t a, int32_t b) {
    return a * b;  // Compiler will automatically use hardware multiplication instruction
}

Effect: Execution efficiency improved by 60%, power consumption reduced by 25%

7. Best Practice Recommendations

Development Stage Recommendations

  1. Regular Analysis – Check firmware size after each build
  2. Optimization Options – Use -Os to balance size and performance
  3. Data Planning – Design data structures reasonably to avoid alignment waste
  4. Tool Integration – Integrate analysis tools into the build process

Considerations

  1. Stack Space – Needs to be calculated separately, not included in bss
  2. Version Differences – Optimization effects may vary between different compiler versions
  3. Startup Files – Pay attention to the size of the interrupt vector table and startup files
  4. Memory Fragmentation – Dynamic allocation needs to consider fragmentation issues

Automation Recommendations

  1. Build Scripts – Integrate analysis tools into automated builds
  2. Report Generation – Automatically generate size analysis reports
  3. Threshold Alerts – Set up size limit check mechanisms
  4. Version Comparison – Regularly compare size changes between different versions

8. Frequently Asked Questions

Q1: Why does the size command show the text section larger than the actual code?A: The text section not only contains program code but also includes read-only data (.rodata), constant strings, jump tables, etc. Use objdump -h to view detailed section distribution.

Q2: How to find the functions that occupy the most space?A: Use the nm command:

arm-none-eabi-nm --size-sort -S firmware.elf | tail -10

Q3: When is the data in the bss section initialized?

A: The data in the bss section is automatically zeroed by the startup code at program startup, does not occupy FLASH space but occupies RAM during runtime.

Q4: How to verify the optimization effect?

A: Compare before and after optimization using the size command, use nm to check function size changes, and analyze key functions through objdump.

9. Conclusion

Mastering GCC firmware analysis techniques is crucial for embedded developers. Through the methods and tools introduced in this article, developers can:

  1. Precisely Control – Accurately understand the composition of firmware size
  2. Effectively Optimize – Make optimization decisions based on data
  3. Ensure Quality – Ensure that optimizations do not affect functional correctness
  4. Improve Performance – Systematically enhance system performance

Firmware analysis is not only a technical skill but also a way of engineering thinking. Through continuous analysis and optimization, we can develop more efficient and reliable embedded systems.

Follow our WeChat public account for more technical insights

Leave a Comment