Practical Optimization Techniques in Embedded Development

Source | Embedded Miscellaneous

In embedded development, resources are always scarce. Insufficient memory, slow execution speed, high power consumption… do these issues often trouble you?

Today, I will share several code optimization techniques that have been validated in practice!

1. Time Efficiency Optimization

Avoid Floating Point Operations

// Slow version: floating point operation
float calculate_voltage(int adc_value) {
    return adc_value * 3.3f / 4096.0f;
}

// Fast version: fixed-point operation
int calculate_voltage_fast(int adc_value) {
    return (adc_value * 3300) >> 12;  // Replace division by 4096 with right shift
}

Reduce Function Calls

// Slow version: frequent function calls
for(int i = 0; i < 1000; i++) {
    set_led_state(i % 2);
}

// Fast version: inline expansion
for(int i = 0; i < 1000; i++) {
    if(i % 2) {
        GPIO_SetBits(GPIOA, GPIO_Pin_5);
    } else {
        GPIO_ResetBits(GPIOA, GPIO_Pin_5);
    }
}

2. Space Efficiency Optimization

Select Appropriate Data Types

// Memory-wasting approach
struct sensor_data {
    int temperature;     // Only needs -40 to 125
    int humidity;        // Only needs 0 to 100
    int pressure;        // Only needs 300 to 1100
};

// Memory-saving approach
struct sensor_data_optimized {
    int8_t temperature;   // -128 to 127, sufficient
    uint8_t humidity;     // 0 to 255, sufficient
    uint16_t pressure;    // 0 to 65535, sufficient
};
// Memory saving: reduced from 12 bytes to 4 bytes

Use Unions to Save Space

// Communication protocol data packet
typedef union {
    struct {
        uint8_t header[4];
        uint8_t cmd;
        uint8_t data[32];
        uint8_t checksum;
    } packet;
    uint8_t raw_data[38];
} comm_frame_t;

// Can be accessed by byte or by structure
comm_frame_t frame;
frame.packet.cmd = 0x01;           // Structured access
send_data(frame.raw_data, 38);     // Byte array access

3. Trade Space for Time

Scenario

You need to count how many 1s are in a 4-bit data (0x0~0xF), the traditional method is to loop through each bit.

Regular Approach

int count_ones_slow(unsigned char data) {
    int cnt = 0;
    unsigned char temp = data &amp; 0xf;
    
    for (int i = 0; i < 4; i++) {
        if (temp &amp; 0x01) {
            cnt++;
        }
        temp >>= 1;
    }
    return cnt;
}

Optimized Approach

// Pre-computed lookup table
static int ones_table[16] = {
    0, 1, 1, 2, 1, 2, 2, 3, 
    1, 2, 2, 3, 2, 3, 3, 4
};

int count_ones_fast(unsigned char data) {
    return ones_table[data &amp; 0xf];
}

Performance Comparison:

Traditional method: requires 4 loops + 4 bit operations
Lookup method: only requires 1 array access

Applicable Scenarios: complex calculations, trigonometric functions, CRC checks, etc.

4. Use Flexible Arrays

Problems with Traditional Pointer Method

typedef struct {
    uint16_t head;
    uint8_t id;
    uint8_t type;
    uint8_t length;
    uint8_t *value;  // Pointer method
} protocol_old_t;

Problems:

Requires two memory allocations
Memory is not contiguous, access efficiency is low
Memory release is prone to errors

The Elegance of Flexible Arrays

typedef struct {
    uint16_t head;
    uint8_t id;
    uint8_t type;
    uint8_t length;
    uint8_t value[];  // Flexible array
} protocol_new_t;

Advantages:

Single allocation, contiguous memory
Faster access speed
Simpler memory management
Avoids memory leak risks

Usage Example:

// Allocate struct + data space
protocol_new_t *p = malloc(sizeof(protocol_new_t) + data_len);

5. Use Bit Manipulation

Bit Fields: A Memory-Saving Trick

How would you manage 8 flag bits?

Memory-Wasting Approach:

struct flags_waste {
    unsigned char flag1;  
    unsigned char flag2;
    unsigned char flag3;
    unsigned char flag4;
    unsigned char flag5;
    unsigned char flag6;
    unsigned char flag7;
    unsigned char flag8;  // Total 8 bytes!
};

Memory-Efficient Approach:

struct flags_smart {
    unsigned char flag1:1;  // Only uses 1 bit
    unsigned char flag2:1;
    unsigned char flag3:1;
    unsigned char flag4:1;
    unsigned char flag5:1;
    unsigned char flag6:1;
    unsigned char flag7:1;
    unsigned char flag8:1;  // Total 1 byte!
} flags;

Memory Saving: reduced from 8 bytes to 1 byte!

Bit Operations: Replacing Multiplication and Division

Slow Version:

uint32_t val = 1024;
uint32_t doubled = val * 2;    // Multiplication instruction
uint32_t halved = val / 2;     // Division instruction

Fast Version:

uint32_t val = 1024;
uint32_t doubled = val << 1;   // Left shift 1 bit = multiply by 2
uint32_t halved = val >> 1;    // Right shift 1 bit = divide by 2

6. Loop Unrolling – Reducing Jumps

Overhead of Traditional Loops

// Each loop has jump overhead
for (int i = 0; i < 4; i++) {
    process(array[i]);
}

Efficient Version After Unrolling

// Direct execution, no jump overhead
process(array[0]);
process(array[1]);
process(array[2]);
process(array[3]);

Advanced Unrolling: Parallel Computation

Regular Version:

long calc_sum_slow(int *arr0, int *arr1) {
    long sum = 0;
    for (int i = 0; i < 1000; i++) {
        sum += arr0[i] * arr1[i];  // Serial computation
    }
    return sum;
}

Optimized Version:

long calc_sum_fast(int *arr0, int *arr1) {
    long sum0 = 0, sum1 = 0, sum2 = 0, sum3 = 0;
    
    for (int i = 0; i < 250; i += 4) {
        sum0 += arr0[i+0] * arr1[i+0];  // Parallel computation
        sum1 += arr0[i+1] * arr1[i+1];
        sum2 += arr0[i+2] * arr1[i+2];
        sum3 += arr0[i+3] * arr1[i+3];
    }
    
    return sum0 + sum1 + sum2 + sum3;
}

7. Inline Functions – Eliminating Function Call Overhead

The Hidden Cost of Function Calls

Each function call has overhead:

Parameter stack pushing
Jump instructions
Stack frame management
Return address saving

The Magic of Inline Functions

static inline void toggle_led(uint8_t pin) {
    PORT ^= 1 << pin;
}

// Directly expanded after compilation, no function call overhead
toggle_led(LED_PIN);

Applicable Scenarios:

Frequently called small functions
Functions on critical paths
Simple utility functions

8. Data Type Optimization

The Art of Loop Variables

Inefficient Approach:

char i;  // May overflow, compiler needs extra checks
for (i = 0; i < N; i++) {
    // ...
}

Efficient Approach:

int i;   // Better compiler optimization
for (i = 0; i < N; i++) {
    // ...
}

Data Type Selection Principles

Loop Index: Prefer using int
Storage Optimization: Use char if possible instead of int
Compute Intensive: Avoid unnecessary floating-point operations
Type Conversion: Reduce implicit type conversions

9. Loop Optimization Strategies

The Art of Nested Loop Arrangement

Inefficient Arrangement (long loop on the outer layer):

for (row = 0; row < 100; row++) {      // Outer loop 100 times
    for (col = 0; col < 5; col++) {    // Inner loop 5 times
        sum += a[row][col];
    }
}
// Total jump count: 100 outer jumps

Efficient Arrangement (long loop on the inner layer):

for (col = 0; col < 5; col++) {       // Outer loop 5 times
    for (row = 0; row < 100; row++) {  // Inner loop 100 times
        sum += a[row][col];
    }
}
// Total jump count: 5 outer jumps

Early Exit Strategy

Inefficient Method (execute complete loop):

bool found = false;
for (int i = 0; i < 10000; i++) {
    if (list[i] == target) {
        found = true;  // Continue looping even after found!
    }
}

Efficient Method (exit immediately upon finding):

bool found = false;
for (int i = 0; i < 10000; i++) {
    if (list[i] == target) {
        found = true;
        break;  // Exit immediately
    }
}

10. Structure Memory Alignment Optimization

The Impact of Memory Alignment

Unoptimized Version:

struct waste_memory {
    char a;      // 1 byte
    short b;     // 2 bytes, needs alignment
    char c;      // 1 byte  
    int d;       // 4 bytes, needs alignment
    char e;      // 1 byte
};
// Actual usage: 16 bytes

Optimized Version:

struct save_memory {
    char a;      // 1 byte
    char c;      // 1 byte
    short b;     // 2 bytes, perfectly aligned
    int d;       // 4 bytes, perfectly aligned
    char e;      // 1 byte
};
// Actual usage: 12 bytes, saving 25%!

Notes

Optimization Principles

Do not optimize for the sake of optimization; do not sacrifice code readability.

Measure first, then optimize: Use tools to find the real bottlenecks
Trade-offs: Performance vs readability vs maintainability
Incremental optimization: Optimize hotspot code first
Validate results: Test correctness after optimization

Good optimization is not about showing off skills, but finding the best balance under constraints.

What other optimization techniques have you used in embedded development?

———— END ————

Practical Optimization Techniques in Embedded Development

In embedded projects, do you use off-the-shelf hardware and software modules, or do you build from scratch?

Practical Optimization Techniques in Embedded Development

OSPI Flash Adaptation Secrets: The Art of Transformation

Practical Optimization Techniques in Embedded Development

Principles and Implementation Methods of External Flash Download Algorithms for Microcontrollers

1. Time Efficiency Optimization

Avoid Floating Point Operations

Reduce Function Calls

2. Space Efficiency Optimization

Select Appropriate Data Types

Use Unions to Save Space

3. Trade Space for Time

Scenario

Regular Approach

Optimized Approach

4. Use Flexible Arrays

Problems with Traditional Pointer Method

The Elegance of Flexible Arrays

5. Use Bit Manipulation

Bit Fields: A Memory-Saving Trick

Bit Operations: Replacing Multiplication and Division

6. Loop Unrolling – Reducing Jumps

Overhead of Traditional Loops

Efficient Version After Unrolling

Advanced Unrolling: Parallel Computation

7. Inline Functions – Eliminating Function Call Overhead

The Hidden Cost of Function Calls

The Magic of Inline Functions

8. Data Type Optimization

The Art of Loop Variables

Data Type Selection Principles

9. Loop Optimization Strategies

The Art of Nested Loop Arrangement

Early Exit Strategy

10. Structure Memory Alignment Optimization

The Impact of Memory Alignment

Notes

Optimization Principles

Related posts

Leave a Comment Cancel reply