How to Ensure Flawless Embedded OTA Updates

Over-The-Air (OTA) firmware updates are a standard feature of modern embedded systems. How can we ensure the security of the update process? How do we handle exceptions such as network interruptions and power outages? How can we implement an efficient update mechanism on resource-constrained embedded devices?

OTA Update Architecture

System Architecture Design Principles

When designing an OTA update system, the following core principles must be followed:

  1. 1. Security Principle: Ensure that the firmware source is trustworthy to prevent malicious firmware from being installed.
  2. 2. Reliability Principle: Ensure the atomicity of the update process to avoid bricking the device.
  3. 3. Efficiency Principle: Minimize data transfer volume to reduce update time and costs.
  4. 4. Fault Tolerance Principle: Be able to recover from various exceptions, including network interruptions and power outages.
  5. 5. Maintainability Principle: The system design should be clear and easy to test and debug.

Overall System Architecture

Storage Partition
Device Architecture
Network Transmission Layer
Cloud Service Layer




OTA Management Server
Firmware Repository
Differential Engine
Signature Service
Device Management Platform
HTTPS/TLS Encrypted Channel
Resume from Breakpoint
Differential Transmission
Bootloader
Application App Slot A
Application App Slot B
Updater
Shared Memory Area
Bootloader Partition
App Slot A Partition
App Slot B Partition
Updater Partition
Configuration Storage Area

Differential Update Technology

Differential Update Principle

Differential updates (Delta Update) are an efficient firmware update method that only transmits the changed parts by comparing the differences between the old and new firmware versions, rather than the entire firmware file. This method can significantly reduce data transfer volume, saving bandwidth and time.

Basic Principle:

  1. 1. Server Side: Use differential algorithms (such as bsdiff, xdelta, etc.) to generate a differential package (patch file) from the old version to the new version.
  2. 2. Device Side: After receiving the differential package, merge it with the current firmware (apply patch) to generate the new version of the firmware.
  3. 3. Verification: Perform integrity checks on the newly generated firmware to ensure the update is correct.

Differential Algorithm Selection

Commonly used differential algorithms include:

  • bsdiff/bspatch: A binary differential algorithm suitable for any binary file, with high compression rates.
  • xdelta3: An open-source differential algorithm with excellent performance, suitable for large files.
  • Custom Algorithms: Specialized algorithms optimized for specific firmware formats.

For resource-constrained embedded devices, bsdiff/bspatch can be used as it strikes a good balance between compression rate and computational complexity.

Differential Update Process

Flash Storage Bootloader Device Side Server Flash Storage Bootloader Device Side Server Generate Differential Package Request Upgrade Download Differential Package Apply Differential [Verification Passed][Verification Failed] Read Old Version Firmware Read New Version Firmware Execute Differential Algorithm (bsdiff) Generate Differential Package (.patch) Calculate Differential Package Signature Report Current Firmware Version Return Differential Package Information (Size/Checksum) Request Download Differential Package Transfer Differential Package Data (Support Resume from Breakpoint) Store in Temporary Area Verify Differential Package Integrity Request Enter Upgrade Mode Read Current Firmware (Old Version) Apply Differential Algorithm (bspatch) Write New Firmware to Backup Partition (Slot B) Verify New Firmware Integrity Digital Signature Check Switch Active Partition to Slot B Update Partition Flag Reboot System Clear Backup Partition Data Return Upgrade Failed

Advantages and Challenges of Differential Updates

Advantages:

  1. 1. Bandwidth Savings: Typically reduces data transfer volume by 50%-90%, which is especially important for large-scale deployments.
  2. 2. Reduced Update Time: Significantly shortens data transfer time, enhancing user experience.
  3. 3. Lower Update Costs: In pay-per-usage scenarios, it can significantly reduce costs.
  4. 4. Increased Success Rate: Smaller data transfer volume reduces the probability of transmission errors.

Challenges:

  1. 1. Memory Requirements: Applying differentials requires loading the old firmware, differential package, and new firmware simultaneously, leading to high RAM requirements.
  2. 2. Computational Overhead: Differential merging requires a certain level of CPU computational power.
  3. 3. Version Management: Maintaining the differential relationships between multiple versions increases management complexity.
  4. 4. Error Recovery: Recovery strategies after differential merging failures need to be specially designed.

Key Points for Implementing Differential Updates

Server-Side Implementation:

Direct Path
Jump Path


Firmware Version Management
Check Version Path
Single Step Differential V1->V2
Multi-Step Differential V1->V1.1->V2
Generate Differential Package
Calculate Differential Package Checksum
Digital Signature
Store in Repository
Device Request
Current Version
Find Optimal Differential Path
Return Differential Package Information

Device-Side Implementation:

  1. 1. Memory Management: Use streaming processing, reading old firmware and applying differentials in chunks to avoid loading the entire firmware at once.
  2. 2. Resume from Breakpoint: Support resuming differential package downloads to improve transmission reliability.
  3. 3. Double Verification: Verify the differential package itself and then verify the new firmware after merging.
  4. 4. Rollback Mechanism: Ensure the ability to roll back in case of merge failures, guaranteeing system recoverability.

A/B Partition Mechanism

A/B Partition Principle

The A/B partition mechanism (also known as Dual-Bank or Redundant Boot) is a redundant firmware storage solution that maintains two independent firmware partitions (Partition A and Partition B) within the device, ensuring that at least one usable firmware version is always available during the upgrade process. The core idea of this mechanism is:When upgrading to new firmware, do not disrupt the currently running firmware.

Working Principle:

  1. 1. Dual Partition Storage: The device has two independent firmware partitions that can store two different firmware versions.
  2. 2. Active Partition Switching: The Bootloader maintains a flag indicating which partition should be booted from.
  3. 3. Atomic Upgrade: New firmware is written to the inactive partition, and the active partition flag is only switched after verification.
  4. 4. Automatic Rollback: If the new firmware fails to boot, the Bootloader automatically switches back to the old partition.

A/B Partition Architecture Design

Bootloader Logic
Shared Memory Area
Flash Storage Partition

A

B


Yes

No

Yes

No


Yes

No


Bootloader Partition Not Updatable Responsible for Booting and Partition Management
App Slot A Version: V1.0 Status: Stable
App Slot B Version: V2.0 Status: Pending Verification
Active Partition Flag active_slot = A
Boot Counter boot_counter = 0
Upgrade Status Flag update_in_progress = false
Read Active Partition Flag
Active Partition
Verify Slot A Firmware
Verify Slot B Firmware
Verification Passed?
Verification Passed?
Boot Slot A
Try Slot B
Boot Slot B
Enter Safe Mode
Is Slot B Valid?

A/B Partition Upgrade Process

System Boot

Receive Upgrade Request

Download Complete

Verification Passed

Verification Failed, Redownload

Installation Complete, Atomic Switch

Reboot System

Boot Check

Running Normally, Mark Stable

Boot Failed

Automatic Rollback

Next Upgrade Write to Slot A

Download Complete

Verification Passed

Installation Complete

Reboot System

AppSlotA_Running
DownloadToSlotB
VerifyingSlotB
InstallingSlotB
SwitchToSlotB
AppSlotB_Running
VerifySlotB
AppSlotB_Stable
RollbackToSlotA
DownloadToSlotA
VerifyingSlotA
InstallingSlotA
SwitchToSlotA
Key Point: Atomic switch ensures partition flags and firmware states are consistent during updates.
Automatic rollback mechanism prevents devices from becoming bricked.

Guaranteeing Atomicity of Partition Switching

Partition switching is a critical operation that must guarantee atomicity; otherwise, it may lead to inconsistent system states. Here are several methods to ensure atomicity:

Using Hardware-Supported Atomic Operations

Prepare Switch
Disable Interrupts
Write New Partition Flag
Synchronize Flash Write
Enable Interrupts
System Reboot

Critical steps must be completed in an uninterruptible sequence to ensure the atomicity of flag updates and system reboots.

Advantages and Limitations of A/B Partitions

Advantages:

  1. 1. Zero Downtime Upgrades: The device can continue running during the upgrade process, switching only after completion.
  2. 2. Automatic Rollback: Automatically rolls back if the new firmware fails to boot, preventing the device from becoming bricked.
  3. 3. Fast Recovery: Rollback operations are very quick, requiring only a switch of the partition flag.
  4. 4. Reduced Risk: Upgrade failures do not affect the currently running firmware.

Limitations:

  1. 1. Storage Space Requirements: Requires double the storage space, increasing costs.
  2. 2. Increased Complexity: Partition management and state synchronization increase system complexity.
  3. 3. Boot Time: Requires additional verification and selection logic, which may slightly increase boot time.
  4. 4. Version Management: Requires maintaining version information for both partitions.

Digital Signature Verification

Digital Signature Principle

Digital signatures are a core technology for ensuring firmware security, using cryptographic methods to ensure the integrity (not tampered with) and authenticity (trustworthy source) of the firmware. Digital signatures are based on asymmetric encryption algorithms, using a private key for signing and a public key for verification.

Basic Principle:

  1. 1. Server Side (Signing Process):
  • • Calculate the hash value of the firmware (e.g., SHA-256).
  • • Encrypt the hash value using the private key to generate the digital signature.
  • • Attach the signature to the firmware file.
  • 2. Device Side (Verification Process):
    • • Use the pre-installed public key to decrypt the signature and obtain the original hash value.
    • • Calculate the hash value of the received firmware.
    • • Compare the two hash values; if they match, the verification is successful.

    Digital Signature Algorithm Selection

    Common digital signature algorithms include:

    • RSA: The most widely used algorithm, with high security, but larger signatures (2048-bit key produces a 256-byte signature).
    • ECDSA: Elliptic curve algorithm, smaller signatures (256-bit key produces a 64-byte signature), suitable for resource-constrained devices.
    • Ed25519: A modern elliptic curve algorithm with excellent performance and high security.

    For embedded devices, it is recommended to use ECDSA P-256, as it strikes a good balance between security and resource consumption.

    Digital Signature Verification Process

    Cryptographic Module Bootloader Device Side Server Cryptographic Module Bootloader Device Side Server Signature Generation Receive Firmware Verify Signature [Verification Passed][Verification Failed] Calculate Firmware SHA-256 Hash Sign with Private Key Attach Signature to Firmware Header Receive Complete Firmware (with Signature) Parse Firmware Header Extract Digital Signature Request Verify Firmware Read Firmware Header Extract Firmware Data and Signature Calculate Firmware Data Hash (SHA-256) Verify Signature with Public Key Return Verification Result Mark Firmware as Trusted Allow Installation Reject Installation Return Error Code Log Security Event

    Firmware Signature Format Design

    Firmware Image Structure
    Firmware Header Metadata
    Firmware Data Binary Code
    Digital Signature 256 Bytes
    Magic Number 4 Bytes
    Version Information 16 Bytes
    Firmware Size 4 Bytes
    Data CRC32 4 Bytes
    Data SHA256 32 Bytes
    Header CRC32 4 Bytes
    RSA-2048 Signature 256 Bytes
    or ECDSA Signature 64-72 Bytes
    

    Public Key Management Strategies

    Secure storage of public keys is crucial for the digital signature mechanism. Here are several public key management strategies:

    Strategy 1: Hardware Security Module (HSM)

    Public Key Storage
    Hardware Security Module HSM
    Tamper-Proof Storage
    Hardware Encryption Acceleration
    Firmware Verification
    Hardware Signature Verification
    Return Verification Result
    

    Strategy 2: Software Storage + Integrity Protection

    Yes
    
    No
    
    Public Key
    Calculate Public Key Hash
    Store Public Key to Flash
    Store Hash to Secure Area
    At Boot Time
    Read Public Key
    Calculate Current Hash
    Compare with Stored Hash
    Hash Matches?
    Use Public Key
    Tampering Detected, Refuse to Boot
    

    Strategy 3: Public Key Rotation Mechanism

    To address long-term deployment scenarios, support for public key rotation is needed:

    Initial State, Use Key Pair 1
    
    Provision Key Pair 2 via Secure Channel
    
    Both Keys Are Usable
    
    New Firmware Signed Only with Key Pair 2
    
    All Devices Upgraded
    
    Key1_Active
    Key2_Provisioned
    Both_Keys_Valid
    Key2_Only
    Key2_Active
    During Transition: Support Both Key Sets for Smooth Switching
    

    Power Failure Protection Mechanism

    Power Failure Risk Analysis

    During the OTA update process, devices may suddenly lose power for various reasons:

    • • Power Failure
    • • Battery Exhaustion
    • • User Accidental Power Off
    • • Network Interruptions Leading to Watchdog Resets

    Potential issues caused by power failure:

    Power Failure During Update
    Incomplete Flash Write
    Inconsistent Partition Flags
    Firmware Data Corruption
    Firmware Boot Failure
    Incorrect Partition Selection
    Risk of Bricking the Device
    

    Power Failure Protection Strategies

    The core idea of the power failure protection mechanism is:Ensure that at any time, at least one complete and bootable firmware version is available.

    Three Principles of Power Failure Protection
    
    
    Principle 1: Atomic Writes Either Fully Write or Not Write at All
    Principle 2: State Flag Protection Use Redundant Flags and Checksums
    Principle 3: Boot-Time Verification Detect and Repair Inconsistent States
    Implementation: Transactional Writes
    Implementation: Multiple Flag Verification
    Implementation: Automatic Recovery Mechanism
    Power Failure Protection System
    

    Bootloader Power Failure Recovery at Startup

    The Bootloader must check the integrity of the system and perform necessary recovery operations at each startup:

    No
    
    Yes
    
    Yes
    
    No
    
    Yes
    
    No
    
    Yes
    
    No
    
    
    
    Yes
    
    No
    
    Yes
    
    No
    
    Bootloader Startup
    Read Status Flag
    Is Flag Valid?
    Initialize Default State
    Check Upgrade Status Flag
    Is Upgrade in Progress?
    Check Write Progress
    Normal Startup Process
    Is Progress Complete?
    Verify Written Firmware
    Power Failure Detected
    Is Firmware Valid?
    Mark Upgrade Complete
    Clear Incomplete Writes
    Restore Pre-Upgrade State
    Boot with Original Firmware
    Switch to New Firmware
    Boot New Firmware
    Verify Active Partition Firmware
    Verification Passed?
    Boot Active Partition
    Try Backup Partition
    Is Backup Partition Valid?
    Switch to Backup Partition and Boot
    Enter Safe Mode
    

    Watchdog Protection Mechanism

    The Watchdog Timer is an important mechanism to prevent the system from deadlocking under abnormal conditions:

    Bootloader Watchdog Timer Application Bootloader Watchdog Timer Application Normal Upgrade Process loop [every 5 seconds] Abnormal Condition Handling Start Upgrade, Configure Watchdog (60 seconds) Feed Watchdog (Reset Timer) Continue Upgrade Operation Upgrade Complete Turn Off Watchdog Request Reboot Start Upgrade, Configure Watchdog (60 seconds) Deadlock Occurs During Upgrade 60 Seconds Without Feeding Watchdog Triggers System Reset Detect Watchdog Reset Execute Power Failure Recovery Process

    Conclusion

    1. 1. Differential Updates: By only transmitting the differences in firmware, significantly reduce data transfer volume, saving bandwidth and time.
    2. 2. A/B Partition Mechanism: By maintaining two independent firmware partitions, ensure automatic rollback in case of upgrade failure, preventing devices from becoming bricked.
    3. 3. Digital Signature Verification: Use cryptographic methods to ensure firmware integrity and authenticity, preventing malicious firmware from being installed.
    4. 4. Power Failure Protection Mechanism: Through atomic writes, state flag protection, and boot-time verification, ensure that the device can recover normally under any circumstances.

    In practical applications, these technologies need to be appropriately adjusted and optimized based on specific hardware platforms, resource constraints, and application scenarios.

    Leave a Comment