Over-The-Air (OTA) firmware updates are a standard feature of modern embedded systems. How can we ensure the security of the update process? How do we handle exceptions such as network interruptions and power outages? How can we implement an efficient update mechanism on resource-constrained embedded devices?
OTA Update Architecture
System Architecture Design Principles
When designing an OTA update system, the following core principles must be followed:
- 1. Security Principle: Ensure that the firmware source is trustworthy to prevent malicious firmware from being installed.
- 2. Reliability Principle: Ensure the atomicity of the update process to avoid bricking the device.
- 3. Efficiency Principle: Minimize data transfer volume to reduce update time and costs.
- 4. Fault Tolerance Principle: Be able to recover from various exceptions, including network interruptions and power outages.
- 5. Maintainability Principle: The system design should be clear and easy to test and debug.
Overall System Architecture
Storage Partition
Device Architecture
Network Transmission Layer
Cloud Service Layer
OTA Management Server
Firmware Repository
Differential Engine
Signature Service
Device Management Platform
HTTPS/TLS Encrypted Channel
Resume from Breakpoint
Differential Transmission
Bootloader
Application App Slot A
Application App Slot B
Updater
Shared Memory Area
Bootloader Partition
App Slot A Partition
App Slot B Partition
Updater Partition
Configuration Storage Area
Differential Update Technology
Differential Update Principle
Differential updates (Delta Update) are an efficient firmware update method that only transmits the changed parts by comparing the differences between the old and new firmware versions, rather than the entire firmware file. This method can significantly reduce data transfer volume, saving bandwidth and time.
Basic Principle:
- 1. Server Side: Use differential algorithms (such as bsdiff, xdelta, etc.) to generate a differential package (patch file) from the old version to the new version.
- 2. Device Side: After receiving the differential package, merge it with the current firmware (apply patch) to generate the new version of the firmware.
- 3. Verification: Perform integrity checks on the newly generated firmware to ensure the update is correct.
Differential Algorithm Selection
Commonly used differential algorithms include:
- • bsdiff/bspatch: A binary differential algorithm suitable for any binary file, with high compression rates.
- • xdelta3: An open-source differential algorithm with excellent performance, suitable for large files.
- • Custom Algorithms: Specialized algorithms optimized for specific firmware formats.
For resource-constrained embedded devices, bsdiff/bspatch can be used as it strikes a good balance between compression rate and computational complexity.
Differential Update Process
Flash Storage Bootloader Device Side Server Flash Storage Bootloader Device Side Server Generate Differential Package Request Upgrade Download Differential Package Apply Differential [Verification Passed][Verification Failed] Read Old Version Firmware Read New Version Firmware Execute Differential Algorithm (bsdiff) Generate Differential Package (.patch) Calculate Differential Package Signature Report Current Firmware Version Return Differential Package Information (Size/Checksum) Request Download Differential Package Transfer Differential Package Data (Support Resume from Breakpoint) Store in Temporary Area Verify Differential Package Integrity Request Enter Upgrade Mode Read Current Firmware (Old Version) Apply Differential Algorithm (bspatch) Write New Firmware to Backup Partition (Slot B) Verify New Firmware Integrity Digital Signature Check Switch Active Partition to Slot B Update Partition Flag Reboot System Clear Backup Partition Data Return Upgrade Failed
Advantages and Challenges of Differential Updates
Advantages:
- 1. Bandwidth Savings: Typically reduces data transfer volume by 50%-90%, which is especially important for large-scale deployments.
- 2. Reduced Update Time: Significantly shortens data transfer time, enhancing user experience.
- 3. Lower Update Costs: In pay-per-usage scenarios, it can significantly reduce costs.
- 4. Increased Success Rate: Smaller data transfer volume reduces the probability of transmission errors.
Challenges:
- 1. Memory Requirements: Applying differentials requires loading the old firmware, differential package, and new firmware simultaneously, leading to high RAM requirements.
- 2. Computational Overhead: Differential merging requires a certain level of CPU computational power.
- 3. Version Management: Maintaining the differential relationships between multiple versions increases management complexity.
- 4. Error Recovery: Recovery strategies after differential merging failures need to be specially designed.
Key Points for Implementing Differential Updates
Server-Side Implementation:
Direct Path
Jump Path
Firmware Version Management
Check Version Path
Single Step Differential V1->V2
Multi-Step Differential V1->V1.1->V2
Generate Differential Package
Calculate Differential Package Checksum
Digital Signature
Store in Repository
Device Request
Current Version
Find Optimal Differential Path
Return Differential Package Information
Device-Side Implementation:
- 1. Memory Management: Use streaming processing, reading old firmware and applying differentials in chunks to avoid loading the entire firmware at once.
- 2. Resume from Breakpoint: Support resuming differential package downloads to improve transmission reliability.
- 3. Double Verification: Verify the differential package itself and then verify the new firmware after merging.
- 4. Rollback Mechanism: Ensure the ability to roll back in case of merge failures, guaranteeing system recoverability.
A/B Partition Mechanism
A/B Partition Principle
The A/B partition mechanism (also known as Dual-Bank or Redundant Boot) is a redundant firmware storage solution that maintains two independent firmware partitions (Partition A and Partition B) within the device, ensuring that at least one usable firmware version is always available during the upgrade process. The core idea of this mechanism is:When upgrading to new firmware, do not disrupt the currently running firmware.
Working Principle:
- 1. Dual Partition Storage: The device has two independent firmware partitions that can store two different firmware versions.
- 2. Active Partition Switching: The Bootloader maintains a flag indicating which partition should be booted from.
- 3. Atomic Upgrade: New firmware is written to the inactive partition, and the active partition flag is only switched after verification.
- 4. Automatic Rollback: If the new firmware fails to boot, the Bootloader automatically switches back to the old partition.
A/B Partition Architecture Design
Bootloader Logic
Shared Memory Area
Flash Storage Partition
A
B
Yes
No
Yes
No
Yes
No
Bootloader Partition Not Updatable Responsible for Booting and Partition Management
App Slot A Version: V1.0 Status: Stable
App Slot B Version: V2.0 Status: Pending Verification
Active Partition Flag active_slot = A
Boot Counter boot_counter = 0
Upgrade Status Flag update_in_progress = false
Read Active Partition Flag
Active Partition
Verify Slot A Firmware
Verify Slot B Firmware
Verification Passed?
Verification Passed?
Boot Slot A
Try Slot B
Boot Slot B
Enter Safe Mode
Is Slot B Valid?
A/B Partition Upgrade Process
System Boot
Receive Upgrade Request
Download Complete
Verification Passed
Verification Failed, Redownload
Installation Complete, Atomic Switch
Reboot System
Boot Check
Running Normally, Mark Stable
Boot Failed
Automatic Rollback
Next Upgrade Write to Slot A
Download Complete
Verification Passed
Installation Complete
Reboot System
AppSlotA_Running
DownloadToSlotB
VerifyingSlotB
InstallingSlotB
SwitchToSlotB
AppSlotB_Running
VerifySlotB
AppSlotB_Stable
RollbackToSlotA
DownloadToSlotA
VerifyingSlotA
InstallingSlotA
SwitchToSlotA
Key Point: Atomic switch ensures partition flags and firmware states are consistent during updates.
Automatic rollback mechanism prevents devices from becoming bricked.
Guaranteeing Atomicity of Partition Switching
Partition switching is a critical operation that must guarantee atomicity; otherwise, it may lead to inconsistent system states. Here are several methods to ensure atomicity:
Using Hardware-Supported Atomic Operations
Prepare Switch
Disable Interrupts
Write New Partition Flag
Synchronize Flash Write
Enable Interrupts
System Reboot
Critical steps must be completed in an uninterruptible sequence to ensure the atomicity of flag updates and system reboots.
Advantages and Limitations of A/B Partitions
Advantages:
- 1. Zero Downtime Upgrades: The device can continue running during the upgrade process, switching only after completion.
- 2. Automatic Rollback: Automatically rolls back if the new firmware fails to boot, preventing the device from becoming bricked.
- 3. Fast Recovery: Rollback operations are very quick, requiring only a switch of the partition flag.
- 4. Reduced Risk: Upgrade failures do not affect the currently running firmware.
Limitations:
- 1. Storage Space Requirements: Requires double the storage space, increasing costs.
- 2. Increased Complexity: Partition management and state synchronization increase system complexity.
- 3. Boot Time: Requires additional verification and selection logic, which may slightly increase boot time.
- 4. Version Management: Requires maintaining version information for both partitions.
Digital Signature Verification
Digital Signature Principle
Digital signatures are a core technology for ensuring firmware security, using cryptographic methods to ensure the integrity (not tampered with) and authenticity (trustworthy source) of the firmware. Digital signatures are based on asymmetric encryption algorithms, using a private key for signing and a public key for verification.
Basic Principle:
- 1. Server Side (Signing Process):
- • Calculate the hash value of the firmware (e.g., SHA-256).
- • Encrypt the hash value using the private key to generate the digital signature.
- • Attach the signature to the firmware file.
- • Use the pre-installed public key to decrypt the signature and obtain the original hash value.
- • Calculate the hash value of the received firmware.
- • Compare the two hash values; if they match, the verification is successful.
Digital Signature Algorithm Selection
Common digital signature algorithms include:
- • RSA: The most widely used algorithm, with high security, but larger signatures (2048-bit key produces a 256-byte signature).
- • ECDSA: Elliptic curve algorithm, smaller signatures (256-bit key produces a 64-byte signature), suitable for resource-constrained devices.
- • Ed25519: A modern elliptic curve algorithm with excellent performance and high security.
For embedded devices, it is recommended to use ECDSA P-256, as it strikes a good balance between security and resource consumption.
Digital Signature Verification Process
Cryptographic Module Bootloader Device Side Server Cryptographic Module Bootloader Device Side Server Signature Generation Receive Firmware Verify Signature [Verification Passed][Verification Failed] Calculate Firmware SHA-256 Hash Sign with Private Key Attach Signature to Firmware Header Receive Complete Firmware (with Signature) Parse Firmware Header Extract Digital Signature Request Verify Firmware Read Firmware Header Extract Firmware Data and Signature Calculate Firmware Data Hash (SHA-256) Verify Signature with Public Key Return Verification Result Mark Firmware as Trusted Allow Installation Reject Installation Return Error Code Log Security Event
Firmware Signature Format Design
Firmware Image Structure
Firmware Header Metadata
Firmware Data Binary Code
Digital Signature 256 Bytes
Magic Number 4 Bytes
Version Information 16 Bytes
Firmware Size 4 Bytes
Data CRC32 4 Bytes
Data SHA256 32 Bytes
Header CRC32 4 Bytes
RSA-2048 Signature 256 Bytes
or ECDSA Signature 64-72 Bytes
Public Key Management Strategies
Secure storage of public keys is crucial for the digital signature mechanism. Here are several public key management strategies:
Strategy 1: Hardware Security Module (HSM)
Public Key Storage
Hardware Security Module HSM
Tamper-Proof Storage
Hardware Encryption Acceleration
Firmware Verification
Hardware Signature Verification
Return Verification Result
Strategy 2: Software Storage + Integrity Protection
Yes
No
Public Key
Calculate Public Key Hash
Store Public Key to Flash
Store Hash to Secure Area
At Boot Time
Read Public Key
Calculate Current Hash
Compare with Stored Hash
Hash Matches?
Use Public Key
Tampering Detected, Refuse to Boot
Strategy 3: Public Key Rotation Mechanism
To address long-term deployment scenarios, support for public key rotation is needed:
Initial State, Use Key Pair 1
Provision Key Pair 2 via Secure Channel
Both Keys Are Usable
New Firmware Signed Only with Key Pair 2
All Devices Upgraded
Key1_Active
Key2_Provisioned
Both_Keys_Valid
Key2_Only
Key2_Active
During Transition: Support Both Key Sets for Smooth Switching
Power Failure Protection Mechanism
Power Failure Risk Analysis
During the OTA update process, devices may suddenly lose power for various reasons:
- • Power Failure
- • Battery Exhaustion
- • User Accidental Power Off
- • Network Interruptions Leading to Watchdog Resets
Potential issues caused by power failure:
Power Failure During Update
Incomplete Flash Write
Inconsistent Partition Flags
Firmware Data Corruption
Firmware Boot Failure
Incorrect Partition Selection
Risk of Bricking the Device
Power Failure Protection Strategies
The core idea of the power failure protection mechanism is:Ensure that at any time, at least one complete and bootable firmware version is available.
Three Principles of Power Failure Protection
Principle 1: Atomic Writes Either Fully Write or Not Write at All
Principle 2: State Flag Protection Use Redundant Flags and Checksums
Principle 3: Boot-Time Verification Detect and Repair Inconsistent States
Implementation: Transactional Writes
Implementation: Multiple Flag Verification
Implementation: Automatic Recovery Mechanism
Power Failure Protection System
Bootloader Power Failure Recovery at Startup
The Bootloader must check the integrity of the system and perform necessary recovery operations at each startup:
No
Yes
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Bootloader Startup
Read Status Flag
Is Flag Valid?
Initialize Default State
Check Upgrade Status Flag
Is Upgrade in Progress?
Check Write Progress
Normal Startup Process
Is Progress Complete?
Verify Written Firmware
Power Failure Detected
Is Firmware Valid?
Mark Upgrade Complete
Clear Incomplete Writes
Restore Pre-Upgrade State
Boot with Original Firmware
Switch to New Firmware
Boot New Firmware
Verify Active Partition Firmware
Verification Passed?
Boot Active Partition
Try Backup Partition
Is Backup Partition Valid?
Switch to Backup Partition and Boot
Enter Safe Mode
Watchdog Protection Mechanism
The Watchdog Timer is an important mechanism to prevent the system from deadlocking under abnormal conditions:
Bootloader Watchdog Timer Application Bootloader Watchdog Timer Application Normal Upgrade Process loop [every 5 seconds] Abnormal Condition Handling Start Upgrade, Configure Watchdog (60 seconds) Feed Watchdog (Reset Timer) Continue Upgrade Operation Upgrade Complete Turn Off Watchdog Request Reboot Start Upgrade, Configure Watchdog (60 seconds) Deadlock Occurs During Upgrade 60 Seconds Without Feeding Watchdog Triggers System Reset Detect Watchdog Reset Execute Power Failure Recovery Process
Conclusion
- 1. Differential Updates: By only transmitting the differences in firmware, significantly reduce data transfer volume, saving bandwidth and time.
- 2. A/B Partition Mechanism: By maintaining two independent firmware partitions, ensure automatic rollback in case of upgrade failure, preventing devices from becoming bricked.
- 3. Digital Signature Verification: Use cryptographic methods to ensure firmware integrity and authenticity, preventing malicious firmware from being installed.
- 4. Power Failure Protection Mechanism: Through atomic writes, state flag protection, and boot-time verification, ensure that the device can recover normally under any circumstances.
In practical applications, these technologies need to be appropriately adjusted and optimized based on specific hardware platforms, resource constraints, and application scenarios.