Embedded System Software Architecture Design: A Comprehensive Guide

1. Introduction

Embedded systems are a branch of software design, and their many characteristics determine the choices of system architects. At the same time, some of their issues have considerable generality and can be extended to other fields.

When it comes to embedded software design, the traditional impression is that of microcontrollers, assembly language, and a high dependence on hardware. Traditional embedded software developers often focus only on implementing functionality itself, neglecting factors such as code reuse, data and interface separation, and testability. This leads to the quality of embedded software being highly dependent on the developer’s level, with success or failure resting on one individual. With the rapid development of embedded hardware and software, today’s embedded systems have greatly improved in functionality, scale, and complexity. For example, the maximum main frequency of Marvell’s PXA3xx series has reached 800MHz, with built-in USB, WIFI, 2D graphics acceleration, and 32-bit DDR memory. In terms of hardware, today’s embedded systems have reached or even surpassed the PC platforms of a few years ago. In terms of software, mature operating systems have become established, such as Symbian, Linux, and WinCE. Based on these mature operating systems, various applications such as word processing, graphics, video, audio, games, and web browsing have emerged, with functionality and complexity comparable to PC software. Some commercial device companies that originally relied on dedicated hardware and systems have also begun to change their thinking, replacing previously used proprietary hardware with software-based solutions built on excellent and inexpensive hardware and complete operating systems, achieving lower costs and higher flexibility and maintainability.

2. Factors Influencing Architecture and Its Impact

Architecture is not an isolated technical product; it is influenced by many factors. At the same time, an architecture also affects many aspects of software development.

Here is a specific example.

The engine of a motorcycle must pass a series of tests before leaving the factory. On the production line, the engine is sent to each workstation, where workers perform tests on aspects such as speed, noise, and vibration. The requirement is to implement an embedded device with the following basic functions:

Installed at the workstation, workers turn it on and log in before starting work.
Automatically collect test data through sensors and display it on the screen.
Record all test results and provide statistical functions, such as defect rates.

If you are the architect of this device, what issues should you focus on when designing the architecture?

2.1. Common Misunderstandings

2.1.1. Small Systems Do Not Need Architecture

Many embedded systems are relatively small, generally designed for specific purposes. Influenced by the understanding of engineers, customer scale, and project timelines, architecture design is often neglected, with coding directly aimed at implementing functionality. This behavior superficially meets the needs of progress, cost, and functionality, but in the long run, the cost of expansion and maintenance far exceeds the initial savings. If the original developer of the system continues to stay within the organization and is responsible for the project, everything may go smoothly. However, once they leave, successors may introduce more errors due to insufficient understanding of the system details. It is important to note that the cost of changes in embedded systems is much higher than in general software systems. A good software architecture can describe the system from both macro and micro levels and isolate various parts, making it relatively simple to add new features and maintain later.

Take the example of a subway ticket machine, which was mentioned in previous courses. A simple subway ticket machine only needs to implement the following functions:

A while loop is sufficient to implement this system, and coding and debugging can begin directly. But from an architect’s perspective, are there parts worth abstracting and isolating here?

Billing system. The billing system must be abstracted, such as from single billing to mileage billing.
Sensor system. Sensors include magnetic card readers, coin acceptors, etc. Devices may be replaced.
Error handling and recovery. Given the high reliability and short failure recovery time required, this part needs to be designed separately.

Future potential changes in requirements:

Operating interface. Is there a need to abstract a dedicated model for future implementation of the view?
Data statistics. Is there a need to introduce a relational database?

If coding is done based on the flowchart above, how much code can be reused when changes occur?

However, do not fall into the trap of over-design. Architecture should meet current needs while considering reuse and changes appropriately.

2.1.2. Agile Development Does Not Require Architecture

The emergence of extreme programming and agile development has led some to mistakenly believe that software development no longer requires architecture. This is a significant misunderstanding. Agile development was proposed as a solution after the obvious shortcomings of the traditional waterfall development process emerged, so it inherently has a higher starting point and stricter requirements for development. It does not regress to the stone age. In fact, architecture is a part of agile development, but in form, agile development recommends using more efficient and simpler methods for design. For example, drawing UML diagrams on a whiteboard and then photographing them with a digital camera; using user stories instead of use cases, etc. Test-driven agile development further forces engineers to design the functionality and interfaces of components before writing actual code, rather than starting to write code directly. Some characteristics of agile development include:

Targeting larger systems than traditional development processes.
Acknowledging change and iterating architecture.
Being concise but not chaotic.
Emphasizing testing and refactoring.

2. Embedded Software Design Characteristics

To discuss embedded software architecture, one must first understand the characteristics of embedded software design.

2.1. Closely Related to Hardware

Embedded software generally has considerable dependence on hardware. This is reflected in several aspects:

Some functions can only be realized through hardware; software operates hardware and drives hardware.
Differences/changes in hardware can have a significant impact on software.
Without hardware or with incomplete hardware, software cannot run or cannot run completely.

These characteristics lead to several consequences:

The understanding and proficiency of software engineers regarding hardware largely determine the non-functional indicators of software performance/stability, etc., which are relatively complex and require experienced engineers to ensure quality.
Software’s high dependence on hardware prevents stability, maintainability, and reusability.
Software cannot be tested and verified independently from hardware; it often needs to be synchronized with hardware verification, causing delays in progress and widening the range of error localization.

To address these issues, several solutions can be considered:

Implement hardware functions with software. Choose more powerful processors, and use software to realize some hardware functions, which can reduce dependence on hardware and is beneficial in responding to changes and avoiding reliance on specific models and manufacturers. This has become a trend in some industries. The PC platform has also experienced such a process, for example, early Chinese character cards.
Isolate hardware dependence into a hardware abstraction layer, making other parts of the software as hardware-independent as possible and capable of running independently of hardware. This can control the risks of hardware changes or replacements within a limited scope and improve the testability of the software part.

2.2. High Stability Requirements

Most embedded software has high requirements for long-term stable operation of the program. For example, mobile phones often run for months, while communication equipment requires 24*7 normal operation, and even test equipment for communication must run normally for at least 8 hours. To achieve stability, some commonly used design techniques include:

Distributing different tasks across independent processes. Good modular design is key.
Watchdog timers, heartbeats, and restarting failed processes.
A complete and unified logging system for quick problem localization. Embedded devices generally lack powerful debuggers, making the logging system particularly important.
Isolating errors within the smallest scope to avoid error propagation and chain reactions. Core code must be thoroughly verified, while non-core code can run under monitoring or in a sandbox to avoid disrupting the entire system.

For example, GPRS access on Symbian is affected by different hardware and operating system versions, making functionality not very stable. In one version, the system would crash when closing the GPRS connection, which was a known issue. By isolating GPRS connections, HTTP protocol handling, file downloads, and other operations into one process, even though this process crashes after each operation, it does not affect the user.

Double backup techniques are rarely adopted.

2.3. Insufficient Memory

Although today’s embedded systems have significantly more memory than the K-count era, memory shortage issues still plague system architects as software scales increase. Architects can refer to several principles when making design decisions:

2.3.1. Virtual Memory Technology

Some embedded devices need to handle enormous amounts of data, and it is impossible to load all this data into memory. Some embedded operating systems do not provide virtual memory technology; for example, WinCE4.2 allows each program to use a maximum of 32MB of memory. For such applications, architects should design their own virtual memory technology. The core of virtual memory technology is to move data that is unlikely to be used temporarily out of memory. This involves several technical points:

Reference counting; data currently in use cannot be moved out.
Using predictions to anticipate the likelihood of data usage in the next phase. Move out data or load data in advance based on predictions.
Placeholder data/objects.
Cache; cache frequently used data under complex data results for direct access.
Fast persistence and loading.

The following diagram is a schematic of a nationwide telecom machine room management system interface:

Each node has a large amount of data that needs to be loaded, and the above techniques can be used to minimize memory usage.

2.3.2. Two-Phase Construction

In systems with limited memory, handling object construction failures is a necessary issue, with the most common reason being insufficient memory (this is also a requirement for PC platforms, but is often overlooked in practice because memory is cheap). Two-phase construction is a commonly used and effective design. For example:

CMySimpleClass: class CMySimpleClass { public: CMySimpleClass(); ~CMySimpleClass(); ... private: int SomeData; }; CMyCompoundClass: class CMyCompoundClass { public: CMyCompoundClass(); ~CMyCompoundClass(); ... private: CMySimpleClass* iSimpleClass; }; In CMyCompoundClass's constructor, initialize the iSimpleClass object. CMyCompoundClass::CMyCompoundClass() { iSimpleClass = new CMySimpleClass; }

What happens when creating CMyCompoundClass?

CMyCompoundClass* myCompoundClass = new CMyCompoundClass;

Memory is allocated for the CMyCompoundClass object.
The constructor for the CMyCompoundClass object is called.
In the constructor, an instance of CMySimpleClass is created.
The constructor ends and returns.

Everything seems straightforward, but what if an out-of-memory error occurs during the third step when creating the CMySimpleClass object? The constructor cannot return any error information to indicate that the construction was unsuccessful. The caller receives a pointer to CMyCompoundClass, but this object is not fully constructed.

What if an exception is thrown in the constructor? This is a well-known nightmare because the destructor will not be called, and if resources were allocated before creating the CMySimpleClass object, they will leak. It could take an hour to discuss throwing exceptions in constructors, but one suggestion is to avoid throwing exceptions in constructors as much as possible.

Therefore, using the two-phase construction method is a better choice. In simple terms, avoid any actions that may produce errors, such as memory allocation, in the constructor, and move those actions to another function called after construction is complete. For example:

AddressBook* book = new AddressBook(); If(!book->Construct()) { delete book; book = NULL; }

This ensures that when Construct fails, any already allocated resources are released.

The two-phase construction method is widely used in the critical mobile operating system Symbian.

2.3.3. Memory Allocators

Different systems have different characteristics of memory allocation. Some require allocating many small memory blocks, while others need to frequently grow already allocated memory. A good memory allocator can significantly impact the performance of embedded software. The system design should ensure that the entire system uses a unified memory allocator that can be replaced at any time.

2.3.4. Memory Leaks

Memory leaks are very serious for embedded systems with limited memory. By using their own memory allocator, it becomes easy to track memory allocation and release situations, thus detecting memory leaks.

2.4. Limited Processor Capability, High Performance Requirements

This section does not discuss real-time systems, as that is a large specialized topic. For general embedded systems, due to limited processor capabilities, performance issues must be given special attention. Some excellent architectural designs have ultimately led to project failures because they could not meet performance requirements.

2.4.1. Resisting the Temptation of New Technologies

Architects must understand that new technologies often imply complexity and lower performance. Even if this is not absolute, due to the limitations of embedded system hardware performance, flexibility is low. Once new technologies are found to differ from initial expectations, it becomes even more difficult to adapt through modifications. For instance, GWT technology is a Google-developed Ajax development tool that allows programmers to develop Web Ajax applications as if they were desktop applications. This makes it easy to implement remote and local user interfaces with a single codebase on embedded systems. However, running B-S structure applications on embedded devices poses significant performance challenges. Additionally, issues with browser compatibility are also severe, as the current version of GWT is not mature enough.

It has been proven that remote control solutions for embedded systems still need to adopt ActiveX, VNC, or other solutions.

2.4.2. Avoiding Too Many Layers

Layered structures help clarify system responsibilities and achieve decoupling, but each additional layer incurs a performance cost. Especially when large amounts of data need to be transmitted between layers. For embedded systems, when adopting layered structures, controlling the number of layers and avoiding the transmission of large amounts of data, especially between layers in different processes, is essential. If data must be transmitted, avoid extensive data format conversions, such as from XML to binary or from C++ structures to Python structures.

Embedded systems have limited capabilities, so it is crucial to focus those limited capabilities on the core functionality of the system.

2.5. Storage Devices Are Prone to Damage and Slow

Due to size and cost constraints, most embedded devices use storage devices such as Compact Flash, SD, mini SD, and MMC. While these devices have the advantage of not worrying about mechanical damage, their lifespan is relatively short. For example, CF cards can usually only be written to one million times, while SD cards have an even shorter lifespan of only 100,000 writes. For applications like digital cameras, this may be sufficient. However, for applications that require frequent disk writes, such as historical databases, the issue of disk damage will quickly become apparent. For instance, if an application writes a 16MB file to a CF card every day, and the file system is FAT16 with a cluster size of 2K, then after writing the 16MB file, the partition table needs to be written 8192 times. Therefore, a CF card with a million write cycles will actually only work for 1000000/8192 = 122 days, while the vast majority of other areas on the CF card will have only been used a fraction of that.

In addition to the static file partition table and other blocks being frequently read and written, some embedded devices also face the challenge of sudden power loss, which can result in incomplete data on the storage device.

2.5.1. Wear Leveling

The basic idea of wear leveling is to use storage blocks evenly across the memory. A table must be maintained that tracks the usage of storage blocks, including the offset position of each block, current availability, and the number of times it has been erased. When a new write request is made, the following principles are used to select blocks:

Preferably continuous.
The block with the least erase count.

Even when updating existing data, the above principles will be used to allocate new blocks. Similarly, the location of this table must not be fixed; otherwise, the block occupied by this table will wear out first. When updating this table, the same allocation principles will apply.

If there is a large amount of static data on the storage device, the above algorithm will only be effective for the remaining space. In this case, an algorithm for relocating these static data must also be implemented. However, this algorithm can reduce write performance and increase complexity. Generally, only dynamic leveling algorithms are used.

Currently, mature wear leveling file systems include JFFS2 and YAFFS. Another approach involves implementing wear leveling on traditional file systems like FAT16, as long as a sufficiently large file is pre-allocated, and the wear leveling algorithm is implemented within that file. However, this requires modifying the FAT16 code to disable updates to the last modified time.

Today, some CF and SD cards have already implemented wear leveling internally, so no software implementation is required.

2.5.2. Error Recovery

If a power loss occurs while writing data to the storage device, the data in the written area will be in an unknown state. In some applications, this can lead to incomplete files, while in others, it can cause system failures. Therefore, recovering from such errors is also a crucial consideration in embedded software design. Common strategies include two types:

Log-based file systems.

This type of file system does not directly store data but rather logs it, so when a power loss occurs, it can always recover to the previous state. Examples of such file systems include ext3.

Double backup.

The double backup approach is simpler; all data is written twice. Each time, one copy is used. The file partition table must also be double-backed. Assuming there is a data block A, A1 is its backup block, and at the initial moment, the contents of A1 are consistent with A. In the partition table, F points to data block A, and F1 is its backup block. When modifying a file, the content of data block A1 is modified first. If a power loss occurs at this point, the content of A1 will be incorrect, but since F points to the intact A, the data remains undamaged. If A1 is successfully modified, then F1’s content is modified. If a power loss occurs at this point, since F is intact, there is still no issue.

Today, some Flash devices have built-in error detection and correction technologies that can ensure data integrity during power loss. Some also include automatic dynamic/static wear leveling algorithms and bad block processing, requiring no additional treatment by higher-level software and can be used like a hard disk. Therefore, as hardware becomes more advanced, software will become more reliable, and continuous technological progress will allow us to focus more on software functionality itself, which is the trend of development.

2.6. High Cost of Failures

Embedded products are sold to users as a combination of hardware and software, which brings a problem that purely software products do not face: when a product fails, if it needs to be returned for repair, the cost is very high. Common failures in embedded devices include:

a) Data failures. Data cannot be read or is inconsistent due to certain reasons, such as database errors caused by power loss.

b) Software failures. Defects in the software itself that need to be corrected through patches or new versions of the software.

c) System failures. For example, if a user downloads the wrong system kernel, the system fails to start.

d) Hardware failures. This type of failure requires returning to the factory and is not within our discussion scope.

To address the first three types of failures, efforts should be made to ensure that customers or on-site technicians can resolve the issues themselves. From the architectural perspective, the following principles can be considered:

a) Use data management designs with error recovery capabilities. When data errors occur, acceptable processing for users is as follows:

i. Errors are corrected, and all data is valid.

ii. Data that was present at the time of the error (which may be incomplete) is lost, but previous data remains valid.

iii. All data is lost.

iv. The data engine crashes and cannot continue working.

Generally speaking, meeting the second condition is sufficient. (Logging, transactions, backups, error identification)

b) Separate applications from the system. Applications should be placed on pluggable Flash cards that can be upgraded through file copying using a card reader. Avoid using proprietary application software for upgrading applications unless absolutely necessary.

c) Implement a “safe mode.” This means that when the main system is damaged, the device can still boot and upgrade the system again. Commonly used U-Boot can ensure this; when the system is damaged, it can enter U-Boot and upgrade through TFTP.

3. Software Frameworks

Frameworks are commonly used in desktop and network systems, such as the famous ACE, MFC, and Ruby On Rails. However, frameworks are rarely used in embedded systems. The underlying reason is that embedded systems are considered simple, lacking repetition, and overly focused on functionality implementation and performance optimization. As mentioned in the introduction, the current trend in embedded systems is toward increasing complexity, scale, and series development. Therefore, designing software frameworks in embedded systems is also necessary and valuable.

3.1. Problems Faced by Embedded Software Architecture

Previously, we discussed some issues faced by embedded system software architecture, one of which is the dependence on hardware and the complexity of hardware-related software. There are also stringent requirements for embedded software in terms of stability and memory usage. If everyone in the team is an expert in these areas, it may be possible to develop high-quality software. However, the reality is that a team may only have one or two experienced personnel, while the majority are junior engineers. If everyone engages with hardware and is responsible for stability, performance, and other metrics, it is difficult to guarantee the final product’s quality. If the component team consists entirely of talents proficient in hardware and other low-level technologies, it becomes challenging to design software that excels in usability and scalability. Specialization is essential, and the architect’s choice determines the team’s composition.

At the same time, although embedded software development is complex, there is also a lot of potential for reuse. How to reuse and how to respond to future changes?

Thus, how to shield complexity from most people, how to separate concerns, and how to ensure the key non-functional indicators of the system is what embedded software architects should solve. One possible solution is a software framework.

3.2. What is a Framework?

A framework is a semi-finished software product designed to reuse and respond to future demand changes within a given problem domain. Frameworks emphasize the abstraction of specific domains, incorporating a wealth of domain knowledge to shorten software development cycles and improve software quality. Secondary developers using the framework implement specific functionality by rewriting subclasses or assembling objects.

3.2.1. Levels of Software Reuse

Reuse is a frequent topic of discussion, and the adage “Don’t reinvent the wheel” is also well-known. However, there are many levels of understanding regarding reuse.

The most basic form of reuse is copy-pasting. A certain functionality has been implemented before, and when needed again, it is copied and modified for use. Experienced programmers usually have their own code libraries, allowing them to implement faster than new programmers. The downside of copy-pasting is that the code has not been abstracted, often making it not fully applicable, requiring modifications. After multiple reuses, the code becomes chaotic and difficult to understand. Many companies face this issue, where a product’s code is copied from another product and modified slightly, sometimes even without changing class names or variable names. According to the standard that “only code designed for reuse can truly be reused,” this does not count as reuse, or is considered low-level reuse.

A higher level of reuse involves libraries. This requires abstracting frequently used functionalities, extracting constant parts, and providing them to secondary developers in the form of a library. Designing a library requires high demands on the designer, as the library must accommodate various potential uses by secondary developers. This is the most widely used form of reuse, such as the standard C library and STL library. One of the significant advantages of the popular Python language is its extensive library support, while C++ has always lacked a robust unified library support, becoming a shortcoming. Summarizing commonly used functionalities and developing them into libraries is valuable in internal development, but the downside is that upgrading the library may impact many products, necessitating caution.

Frameworks represent another form of reuse. Like libraries, frameworks abstract and implement the unchanging parts of a system, leaving the changing parts to secondary developers. The most significant difference between a typical framework and a library is that a library is static and called by secondary developers, while a framework is active; it controls the flow, and the code of secondary developers must comply with the framework’s design, determining when to call them.

For example, a network application always involves establishing connections, sending and receiving data, and closing connections. A library-based approach would look like this:

conn = connect(host, port); if(conn.isvalid()) { data = conn.recv(); printf(data); conn.close(); }

A framework approach would look like this:

class mycomm: class connect { public: host(); port(); onconnected(); ondataarrived(unsigned char* data, int len); onclose(); };

The framework will create mycomm objects at the “appropriate” time, query host and port, and establish connections. After the connection is established, it calls the onconnected() interface, providing secondary developers with opportunities to process. When data arrives, it calls the ondataarrived interface to let secondary developers handle it. This follows the Hollywood principle: “Don’t call us, we’ll call you.”

Of course, a complete framework usually also provides various libraries for secondary developers to use. For example, MFC provides many libraries, such as CString, but essentially, it is a framework. For instance, implementing the OnInitDialog interface for a dialog box is dictated by the framework.

3.2.2. Abstraction Targeting Highly Specific Domains

Compared to libraries, frameworks are more targeted abstractions of specific domains. Libraries, such as the C library, are aimed at all applications. In contrast, frameworks are relatively narrower. For example, the framework provided by MFC is only suitable for Windows platform desktop application development, ACE is a framework for network application development, and Ruby On Rails is designed for rapid web site development.

The more specific the domain, the stronger the abstraction can be, and the simpler the secondary development can be, as there are more commonalities. For example, the various characteristics of embedded system software development discussed above represent the commonalities of specific domains that can be abstracted. When it comes to actual embedded applications, there will be even more commonalities to abstract.

The purpose of framework design is to summarize the commonalities of specific domains, implement them in the form of a framework, and specify the implementation methods for secondary developers, thereby simplifying development. Accordingly, a framework developed for one domain cannot serve another.

For enterprises, frameworks are an excellent means of accumulating knowledge and reducing costs.

3.2.3. Decoupling and Responding to Changes

A significant goal of framework design is to respond to changes. The essence of responding to changes is decoupling. From the architect’s perspective, decoupling can be divided into three types:

Logical decoupling. This involves abstracting and separating logically different modules, such as decoupling data and interfaces. This is one of the most common forms of decoupling.
Knowledge decoupling. Knowledge decoupling is achieved by designing interfaces so that individuals with different knowledge can work together. A typical example is the specialized knowledge possessed by testing engineers and the programming and implementation knowledge held by development engineers. Traditional testing scripts often merge these two aspects. Therefore, testing engineers are required to have programming skills. By employing appropriate methods, testing engineers can implement their test cases in the simplest way, while developers write traditional program code to execute those cases.
Decoupling change and stability. This is a significant feature of frameworks. Frameworks analyze domain knowledge, fixing common content (the unchanging parts) while leaving potentially changing parts to secondary developers to implement.

3.2.4. Frameworks Can Implement and Specify Non-Functional Requirements

Non-functional requirements refer to characteristics such as performance, reliability, testability, and portability. These traits can be implemented through frameworks. Below are examples of each:

Performance. The most taboo form of performance optimization is universal optimization. The system’s performance often depends on specific points. For example, in embedded systems, access to storage devices is relatively slow. If developers do not pay attention to this issue and frequently read and write to storage devices, performance will decline. If the framework designs the read and write access to the storage device, secondary developers act only as data providers and processors, allowing for adjustment of read and write frequencies within the framework, thus optimizing performance. As frameworks are developed separately and widely used, they can be adequately optimized for critical performance points.

Reliability. Taking the network communication program as an example, since the framework is responsible for creating and managing connections, as well as handling various possible network errors, specific implementers do not need to understand or implement error handling code in this area, ensuring the system’s reliability in network communications. The primary advantage of designing reliability in the form of a framework is that the code developed by secondary developers runs under the framework’s control. On one hand, the framework can implement parts prone to errors, and on the other, it can also capture and handle errors generated by secondary developer code. Libraries cannot replace users in handling errors.

Testability. Testability is an essential aspect that software architecture needs to consider. The next chapters will discuss how good design ensures software testability. On one hand, frameworks dictate secondary developers’ interfaces, thus forcing them to develop code that is easy to unit test. On the other hand, frameworks can also provide designs that make it easy to implement automated testing and regression testing at the system testing level, such as a unified TL1 interface.

Portability. If software portability is a design goal, framework designers can ensure this during the design phase. One way is to shield system differences through cross-platform libraries; another possibility is to base secondary development on scripting. Configuration software is an example of this, where a project configured on a PC can also run on embedded devices.

3.3. An Example of Framework Design

3.3.1. Basic Architecture

3.3.2. Functional Features

The above is an architectural diagram for a product series, characterized by modular hardware that can be plugged and unplugged at any time. Different hardware is used for different communication testing scenarios, such as optical communication testing, xDSL testing, Cable Modem testing, etc. Different firmware and software need to be developed for different hardware. The firmware layer mainly receives commands from the software through the USB interface, reads and writes the corresponding hardware interfaces, performs some calculations, and returns the results to the software. The software runs on the WinCE platform, providing a touch-based graphical interface while also offering XML (SOAP) and TL1 interfaces. To achieve automated testing, it also provides a Lua-based scripting language interface. The entire product series has dozens of different hardware modules, requiring the development of dozens of software sets. Although these software serve different hardware, they share a high degree of similarity. Therefore, choosing to develop a framework first and then develop specific module software based on that framework became the optimal choice.

### 3.3.3. Analysis

The structure of the software part is as follows:

The system is divided into three main parts: software, firmware, and hardware. Software and firmware run on two independent boards, each with its own processor and operating system. Hardware is plugged into the board where the firmware is located and is replaceable.

Both software and firmware are essentially software, and we will analyze them separately.

Software

The primary task of the software is to provide various user interfaces, including local graphical interfaces, SOAP access interfaces, and TL1 access interfaces.

The entire software part is divided into five major components:

Communication layer
Protocol layer
Graphical interface
SOAP server
TL1 server

The communication layer should shield users from understanding the specific communication media and protocols, whether it’s USB or socket, having no impact on the upper layer. The communication layer is responsible for providing reliable communication services and appropriate error handling. Through configuration files, users can change the communication layer used.

The protocol layer’s purpose is to encode and decode data. The output of encoding is a stream that can be sent over the communication layer. Based on the characteristics of embedded software, we choose binary as the format for the stream. The output of decoding is diverse; it can be a C Struct for the interface, XML data, or Lua data structure (table). If needed, JSON, TL1, Python data, TCL data, and so on can also be produced. This layer is automatically generated within the framework, which we will discuss later.

The in-memory database, SOAP Server, and TL1 Server are all users of the protocol layer. The graphical interface interacts with the in-memory database and the underlying communication.

The graphical interface is one of the key focuses of framework design, as this area has the most work and the most repetitive tasks.

Let’s analyze what the primary tasks are in graphical interface development.

Collect user input data and commands.
Send data and commands to the underlying layer.
Receive feedback from the underlying layer.
Display data on the interface.

At the same time, there are some libraries to further simplify development:

This is a simplified example, but it well illustrates the characteristics of a framework:

Client code must implement according to specified interfaces.
The framework calls the client-implemented interfaces at appropriate times.
Each interface is designed to complete only a specific single function.
Connecting various steps organically is the framework’s job; secondary developers do not need to know.
There are usually accompanying libraries.

Firmware

The primary task of the firmware is to accept commands from the software, drive hardware operations, acquire the hardware’s status, perform some calculations, and return them to the software. Early firmware was a thin layer since most work was done by hardware, with firmware only serving as a communication intermediary. However, as time has progressed, firmware has begun to take on more and more tasks originally performed by hardware.

The entire firmware part is divided into five major components:

Hardware abstraction layer, providing access interfaces to the hardware.

Independent task groups.

Task/message dispatcher.

Protocol layer.

Communication layer.

For different devices, the workload is concentrated in the hardware abstraction layer and task groups. The hardware abstraction layer is provided as a library, implemented by engineers who are most familiar with the hardware. The task group consists of a series of tasks representing different business applications, such as measuring the error rate. This part is implemented by relatively inexperienced engineers, whose primary task is to implement specified interfaces according to standardized documentation.

Tasks define the following interfaces, which are to be implemented by specific developers:

OnInit(); OnRegisterMessage(); OnMessageArrive(); Run(); OnResultReport();

The code flow of the framework is as follows (pseudo-code):

CTask* task = new CBertTask(); task->OnInit(); task->OnRegisterMessage(); while(TRUE) { task->OnMessageArrive(); task->Run(); task->OnResultReport(); } delete task; task = NULL;

This way, the implementer of specific tasks only needs to focus on implementing these interfaces. Other tasks, such as hardware initialization, message sending and receiving, encoding and decoding, and result reporting, are handled by the framework. This avoids requiring every engineer to deal with all aspects from top to bottom. Furthermore, such task code has high reusability; for example, implementing the algorithm for PING on both Ethernet and Cable Modem would be the same.

3.3.4. Actual Effects

In actual projects, the framework significantly reduces development difficulty. This is particularly evident in the software part, where even interns can complete high-quality interface development, shortening the development cycle by over 50%. Product quality has improved significantly. The contribution of the framework to the firmware part lies in reducing the need for engineers proficient in low-level hardware; general engineers familiar with measurement algorithms can suffice. At the same time, the existence of the framework ensures elements such as performance, stability, and testability.

3.4. Common Patterns in Framework Design

3.4.1. Template Method Pattern

The template method pattern is the most commonly used design pattern in frameworks. The fundamental idea is to fix the algorithm in the framework while allowing secondary developers to implement specific operations within that algorithm. For example, the logic for initializing a device can be coded in the framework as follows:

TBool CBaseDevice::Init() { if ( DownloadFPGA() != KErrNone ) { LOG(LOG_ERROR,_L(“Download FPGA fail”)); return EFalse; } if ( InitKeyPad() != KerrNone ) { LOG(LOG_ERROR,_L(“Initialize keypad fail”)); return EFalse; } return ETrue; }

DownloadFPGA and InitKeyPad are virtual functions defined by CBaseDevice, and secondary developers create subclasses that inherit from CBaseDevice to implement these two interfaces. The framework defines the order of calls and error handling, and secondary developers do not need to concern themselves with these aspects.

3.4.2. Creational Patterns

Since frameworks typically involve creating various subclasses, creational patterns are frequently used. For instance, in a drawing software framework, a base class defines the interface for graphical objects, and subclasses such as ellipses, rectangles, and lines can be derived from it. When a user draws a shape, the framework must instantiate that subclass. Factory methods, prototype methods, etc., can be used here.

class CDrawObj { public: virtual int DrawObjTypeID()=0; virtual Icon GetToolBarIcon()=0; virtual void Draw(Rect rect)=0; virtual CDrawObj* Clone()=0; };

3.4.3. Message Subscription Pattern

The message subscription pattern is the most commonly used way to separate data from interfaces. Interface developers only need to register the data they need, and when that data changes, the framework will “push” the data to the interface. A common issue with the message subscription pattern is how to handle reentrancy and timeouts in synchronous modes. As a framework designer, this issue must be carefully considered. Reentrancy refers to secondary developers performing subscription/cancellation operations in the callback function, which can disrupt the message subscription mechanism. Timeout refers to the situation where a secondary developer’s message callback function takes too long to process, preventing other messages from being responded to. The simplest solution is to use an asynchronous mode, allowing the subscriber and data publisher to run in independent processes/threads. If this is not feasible, it must be established as an important convention of the framework that secondary developers must avoid such issues.

3.4.4. Decorator Pattern

The decorator pattern gives the framework the ability to add functionality later. The framework defines an abstract base class for decorators, while specific implementers implement them and dynamically add them to the framework.

For example, in a game, the graphics rendering engine is an independent module that can draw characters in various states, such as still or running. If the game designers decide to add an item called “invisibility cloak,” which requires players wearing it to appear as semi-transparent on the screen, how should the rendering engine be designed to adapt to this game upgrade?

When the invisibility cloak is equipped, a filter is added to the rendering engine. This is a significantly simplified example; actual game engines are much more complex. The decorator pattern is also commonly used for pre-processing and post-processing of data.

3.5. Drawbacks of Frameworks

A good framework can greatly improve product development efficiency and quality, but it also has its drawbacks.

Frameworks are generally quite complex, and designing and implementing a good framework requires considerable time. Therefore, frameworks are suitable only when they can be applied repeatedly; in this case, the initial investment cost will yield substantial returns.
Frameworks stipulate a series of interfaces and rules, which simplifies secondary development work, but also requires secondary developers to remember many regulations. If these regulations are violated, the framework will not function properly. However, since frameworks shield a large number of domain details, their learning cost is significantly reduced.
Upgrading a framework can severely impact existing products, necessitating complete regression testing. There are two solutions to this issue. First, rigorous testing of the framework itself is essential, and it may be necessary to establish a comprehensive unit testing library while developing example projects to test all framework functionalities. Second, static linking can be used to prevent existing products from easily following upgrades. Of course, if existing products have good regression testing methods, that would be even better.
Performance loss. The abstraction of the system by frameworks increases complexity. Techniques like polymorphism generally reduce system performance. However, overall, frameworks can ensure that system performance remains at a relatively high level.

4. Automatic Code Generation

4.1. Let Machines Do What They Can

Being lazy is a virtue for programmers and architects alike. The software development process is essentially about telling machines how to do things. If a task can be performed by a machine, it should not be left to a human. Machines are not only tireless but also never make mistakes. Our job is to automate the work of clients, and by thinking a little more, we can also automate part of our own work. Extremely patient programmers can be good, but they can also be detrimental.

Well-designed systems often exhibit many highly similar and regular codes. Systems that are not well-designed may produce many different implementations for the same type of functionality. The previous discussion on framework design has already demonstrated this. Sometimes we can go a step further, analyze the patterns in these similar codes, describe these functionalities with formatted data, and let machines generate the code.

4.2. Examples

4.2.1. Message Encoding and Decoding

The previous example of the framework showed that the message encoding and decoding part has been isolated and is not coupled with other parts. Given its characteristics, it is very suitable for further “formalization” so that machines can generate code.

Encoding is simply the process of streamlining data structures; decoding is the reverse. For encoding, the code is essentially:

stream &lt;&lt; a.i; stream &lt;&lt; a.j; stream &lt;&lt; a.object;

(To simplify, let’s assume that a stream object has been designed that can streamline various data types and has already handled issues such as byte order conversion.)

Finally, we obtain a stream. Are you accustomed to writing such code? But this code does not reflect any creativity from the engineer, as we already know that there is i, j, and an object. Why should we type this code ourselves? If we analyze the definition of a, can we automatically generate this code?

struct dataA { int i; int j; struct dataB object; };

We only need a simple semantic analyzer to parse this code and obtain a tree regarding data types, allowing us to easily produce the code for streamlining all data structures. Such an analyzer can be implemented in about two hundred lines using Python or another language with strong string processing capabilities. The tree regarding data types is similar to the diagram below:

By traversing this tree, we can generate all the streamlining code for data structures.

In the previous framework example project, the code for automatically generating message encoding and decoding for a hardware module amounted to thirty thousand lines, almost equivalent to a small software project. Since it was generated automatically, it contained no errors, providing high reliability for the upper layer.

XML or other formats can also be used to define data structures for automatic code generation. Depending on the needs, this can be done in C++/Java/Python, or any other type of language. If strong checks are desired, XSD can be used to define data structures. There is a commercial product called xBinder that is very expensive and complicated to use, and it is better to develop your own (why is it difficult to use? Because it is too generic). In addition to encoding in binary format, it can also generate code for other readable formats, such as XML. Thus, communication can use binary while debugging can utilize XML, achieving the best of both worlds. The code for generating binary might look like this:

Xmlbuilder.addelement(“i”, a.i); Xmlbuilder.addelement(“j”, a.j); Xmlbuilder.addelement(“object”, a.object);

This approach is also very suitable for machine generation. The same idea can be applied to enable embedded script support within software. This will not be elaborated further here. (The biggest issue with embedded script support is exchanging data between C/C++ and scripts, which involves a significant amount of similar code regarding data types.)

Recently, Google released its protocol buffer, which exemplifies this approach. It currently supports C++/Python and will likely support more languages soon, so everyone can pay attention to it. In the future, we should not write encoding and decoding logic by hand anymore.

4.2.2. GUI Code

In the previous framework design section, we mentioned that the framework cannot handle data collection and interface updates, leaving these tasks to programmers to implement. However, let’s look at what these interface programmers typically do (the code has been simplified and can be considered pseudo-code):

void onDataArrive(CDataBinder&amp; data) { m_biterror.setText(“%d”, data.biterror); m_signallevel.setText(“%d”, data.signallevel); m_latency.setText(“%d”, data.latency); }

Void onCollectData(CDataBinder&amp; data) { data.biterror = atoi(m_biterror.getText()); data.signallevel = atoi(m_signallevel.getText()); data.latency = atoi(m_latency.getText()); }

Is this code interesting? Let’s think about what we can do. (Describing the interface using XML is a problem when it comes to complex logic.)

4.2.3. Summary

As can be seen, during the software architecture process, it is essential to follow general principles, strive to isolate various functional parts, achieve high cohesion and low coupling, and then discover the highly repetitive and regular code in the system, further formalizing and structuring it, and finally letting machines generate this code. Currently, the most successful application in this area is message encoding and decoding. Automating the generation of interface code has certain limitations but can also be applied. Everyone should be adept at discovering such possibilities in their work to reduce workload and improve efficiency.

4.2.4. Google Protocol Buffer

The recently released Google Protocol Buffer is a model of using code generation. Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the