Essential Knowledge: Software Architecture Design for Embedded Systems!

1. Introduction

Embedded systems are a branch of software design, and their various characteristics determine the choices of system architects. At the same time, some of their issues have considerable universality and can be extended to other fields.

When it comes to embedded software design, the traditional impression is microcontrollers, assembly language, and a high dependence on hardware. Traditional embedded software developers often focus solely on implementing the functionality itself, neglecting factors such as code reuse, data and interface separation, and testability. This leads to the quality of embedded software being highly dependent on the developer’s skill level, with success or failure resting on a single individual. With the rapid development of embedded hardware and software, today’s embedded systems have greatly improved in functionality, scale, and complexity. For example, Marvell’s PXA3xx series has reached a maximum clock frequency of 800MHz, built-in USB, WIFI, 2D graphics acceleration, and 32-bit DDR memory. In terms of hardware, today’s embedded systems have reached or even surpassed PC platforms from years ago. In terms of software, mature operating systems have emerged, such as Symbian, Linux, and WinCE. Based on these mature operating systems, various applications such as word processing, image processing, video, audio, games, and web browsing are emerging, with functionality and complexity rivaling that of PC software. Some commercial equipment companies that used to rely on dedicated hardware and systems have begun to change their thinking, replacing previously proprietary hardware functions with software solutions based on excellent and inexpensive hardware and mature operating systems, achieving lower costs and greater flexibility and maintainability.

2. Factors Determining Architecture and Its Impact

Architecture is not an isolated technical product; it is influenced by many factors. At the same time, an architecture also impacts many aspects of software development.

Here is a specific example.

Before leaving the factory, a motorcycle engine must pass a series of tests. On the assembly line, the engine is sent to each workstation for workers to conduct tests on aspects such as speed, noise, and vibration. The requirement is to implement an embedded device with the following basic functions:

Installed at the workstation, the worker turns it on and logs in before starting work.
Automatically collect test data through sensors and display it on the screen.
Record all test results and provide statistical functions, such as defect rates.

If you are the architect of this device, what issues should you focus on when designing the architecture?

2.1. Common Misunderstandings

2.1.1. Small systems do not need architecture

Many embedded systems are relatively small and are generally designed for specific purposes. Influenced by engineers’ understanding, client scale, and project timelines, architecture design is often overlooked, with coding focused solely on achieving functionality. This behavior appears to meet the requirements for timelines, costs, and functionality, but in the long run, the costs incurred in expansion and maintenance far exceed the initial savings. If the original developer of the system continues to stay within the organization and is responsible for the project, everything may proceed smoothly. However, once they leave, successors may introduce more errors due to insufficient understanding of system details. It is important to note that the cost of changes in embedded systems is much higher than in general software systems. A good software architecture can describe the system from both macro and micro perspectives, isolating different parts, making it relatively simple to add new features and conduct subsequent maintenance.

For instance, consider a city rail card reader, which has been mentioned in previous courses. A simple city rail card reader only needs to implement the following functions:

A single while loop is sufficient to implement this system, and coding and debugging can begin directly. But from an architect’s perspective, are there parts worth abstracting and separating?

Billing system. The billing system must be abstracted, for example, from single billing to distance-based billing.
Sensor system. Sensors include magnetic card readers, coin acceptors, etc. The equipment may be replaced.
Error handling and recovery. Given the high reliability and short error recovery time, this part requires separate design.

Future potential changes in requirements:

User interface. Should a dedicated model be abstracted for future implementation of the view?
Data statistics. Should a relational database be introduced?

If coding is done directly based on the above flowchart, how much code can be reused when changes arise?

However, do not fall into the trap of excessive design. Architecture should be based on current needs, with appropriate consideration for reuse and changes.

2.1.2. Agile development does not require architecture

Extreme programming and agile development have led some to mistakenly believe that software development no longer requires architecture. This is a significant misunderstanding. Agile development emerged as a solution to the apparent downsides of traditional waterfall development processes, so it must have a higher starting point and stricter requirements for development, rather than regressing to the Stone Age. In fact, architecture is part of agile development; it simply recommends using more efficient and simpler ways to design. For example, drawing UML diagrams on whiteboards and photographing them with digital cameras; using user stories instead of use cases, etc. Test-driven agile development further forces engineers to design components’ functionality and interfaces before writing actual code, rather than starting to write code directly. Some characteristics of agile development include:

Targeting larger systems than traditional development processes
Acknowledging changes and iterating architecture
Simplicity without chaos
Emphasizing testing and refactoring

2. Embedded Software Design Characteristics

To discuss the architecture of embedded software, one must first understand the characteristics of embedded software design.

2.1. Closely Related to Hardware

Embedded software generally has a considerable dependency on hardware. This is reflected in several aspects:

Some functions can only be achieved through hardware; software operates hardware and drives hardware.
Differences/changes in hardware can have significant impacts on software.
Without hardware or if the hardware is incomplete, software cannot run or cannot run completely.

These characteristics lead to several consequences:

The understanding and proficiency of software engineers concerning hardware will largely determine the performance/stability and other non-functional indicators of the software, which tends to be relatively complex and requires experienced engineers to ensure quality.
Software’s high dependency on hardware design limits its stability, maintainability, and reusability.
Software cannot be tested and verified independently of hardware, often requiring synchronization with hardware validation, leading to a loose start and tight finish in progress, expanding the scope of error localization.

To address these issues, several solutions can be considered:

Implement hardware functions with software. Choose more powerful processors to implement some hardware functions with software, which can reduce dependency on hardware and be beneficial in responding to changes and avoiding reliance on specific models and manufacturers. This has become a trend in some industries. A similar process has been experienced in the PC platform, such as early Chinese character cards.
Isolate hardware dependencies into a hardware abstraction layer, making as many parts of the software hardware-independent as possible, allowing it to run independently of hardware. This controls the risks associated with hardware changes or replacements within a limited range while improving the testability of the software portion.

2.2. High Stability Requirements

Most embedded software has high requirements for the long-term stable operation of programs. For example, mobile phones may be powered on for several months, and communication devices require 24/7 normal operation. Even communication testing equipment requires at least 8 hours of normal operation. To achieve stability, some commonly used design techniques include:

Distributing different tasks across independent processes. Good modular design is key.
Watchdog timers, heartbeats, and restarting failed processes.
A complete and unified logging system for quick problem localization. Embedded devices generally lack powerful debuggers, making logging systems particularly important.
Isolating errors to the smallest range to avoid their spread and chain reactions. Core code must undergo thorough verification, while non-core code can run under monitoring or in a sandbox to prevent it from damaging the entire system.

For example, GPRS access on Symbian is affected by different hardware and operating system versions, resulting in functionality that is not very stable. In one version, the system crashes when the GPRS connection is closed, which is a known issue. By isolating GPRS connections, HTTP protocol handling, and file downloads into a separate process, the process may crash after each operation, but it does not affect the user.

Double backup methods are rarely used.

2.3. Insufficient Memory

Although today’s embedded systems have significantly improved memory compared to the K-sized era, the issue of insufficient memory continues to trouble system architects as software scales grow. Here are some principles that architects can reference when making design decisions:

2.3.1. Virtual Memory Technology

Some embedded devices need to handle large amounts of data, which cannot all fit into memory. Some embedded operating systems do not provide virtual memory technology; for example, WinCE 4.2 allows each program to use a maximum of 32MB of memory. For such applications, architects should design their own virtual memory technology. The core of the so-called virtual memory technology is to move data that is unlikely to be used temporarily out of memory. This involves several technical points:

Reference counting; data in use cannot be moved out.
Using predictions to forecast the likelihood of data being used in the next phase. Based on predictions, data can be moved out or loaded in advance.
Placeholder data/objects.
Cache. Cache frequently used data under complex data results for direct access.
Fast persistence and loading.

The following diagram is a schematic of a nationwide telecom machine room management system interface:

Each node has a large amount of data to load, and the above techniques can be used to minimize memory usage.

2.3.2. Two-Stage Construction

In systems with limited memory, object construction failure must be addressed; the most common reason for failure is insufficient memory (this is also a requirement for PC platforms, but is often overlooked in practice due to the low cost of memory). Two-stage construction is a commonly used and effective design. For example:

CMySimpleClass:class CMySimpleClass { public: CMySimpleClass(); ~CMySimpleClass(); ... private: int SomeData; }; CMyCompoundClass:class CMyCompoundClass { public: CMyCompoundClass(); ~CMyCompoundClass(); ... private: CMySimpleClass* iSimpleClass; }; In CMyCompoundClass's constructor, initialize the iSimpleClass object. CMyCompoundClass::CMyCompoundClass() { iSimpleClass = new CMySimpleClass; }

What happens when creating CMyCompoundClass?

CMyCompoundClass* myCompoundClass = new CMyCompoundClass;

Allocate memory for the CMyCompoundClass object.
Call the constructor of the CMyCompoundClass object.
Create an instance of CMySimpleClass in the constructor.
The constructor ends and returns.

Everything seems straightforward, but what if an error occurs due to insufficient memory when creating the CMySimpleClass object in the third step? The constructor cannot return any error message to indicate that construction was unsuccessful. The caller then receives a pointer to CMyCompoundClass, but this object is not fully constructed.

What if an exception is thrown in the constructor? This is a notorious nightmare because the destructor will not be called; if resources were allocated before creating the CMySimpleClass object, they will leak. It could take an hour to discuss throwing exceptions in constructors, but one suggestion is to avoid throwing exceptions in constructors as much as possible.

Therefore, using the two-stage construction method is a better choice. In simple terms, avoid any actions that might produce errors, such as allocating memory in the constructor, and place these actions in another function after construction is complete. For example:

AddressBook* book = new AddressBook(); If(!book->Construct()) { delete book; book = NULL; }

This ensures that when Construct fails, already allocated resources are released.

The two-stage construction method is commonly used in the most important mobile operating system, Symbian.

2.3.3. Memory Allocator

Different systems have different characteristics in memory allocation. Some require the allocation of many small memories, while others frequently need to grow already allocated memory. A good memory allocator can sometimes have a significant impact on the performance of embedded software. It should be ensured that the entire system uses a unified memory allocator that can be replaced at any time during system design.

2.3.4. Memory Leaks

Memory leaks are very serious for the limited memory of embedded systems. By using their own memory allocators, memory allocation and release can be easily tracked, allowing memory leak situations to be detected.

2.4. Limited Processor Capability, High Performance Requirements

This section will not discuss real-time systems, as that is a large specialized topic. For general embedded systems, due to limited processor capabilities, performance issues must be given special attention. Some excellent architectural designs fail due to an inability to meet performance requirements, ultimately leading to project failure.

2.4.1. Resisting the Temptation of New Technologies

Architects must understand that new technologies often mean complexity and lower performance. Even if this is not absolute, due to the limitations of embedded system hardware performance, flexibility is low. Once it is found that new technologies differ from the original expectations, adapting through modifications becomes even more challenging. For example, GWT technology. This is an Ajax development tool launched by Google that allows programmers to develop web Ajax programs as if developing a desktop application. This makes it easy to implement remote and local user interfaces with the same code in embedded systems. However, running B-S structured applications on embedded devices poses significant performance challenges. Additionally, browser compatibility issues are quite severe, and the current version of GWT is not yet mature enough.

It has been proven that embedded remote control solutions still require the use of ActiveX, VNC, or other solutions.

2.4.2. Avoiding Excessive Layering

Layered structures facilitate clear delineation of system responsibilities and achieve system decoupling, but each additional layer incurs a performance cost. Especially when large amounts of data need to be transmitted between layers. For embedded systems, when adopting layered structures, the number of layers must be controlled, and large data transfers, especially between different process layers, should be avoided. If data transfer is necessary, it is essential to avoid extensive data format conversions, such as from XML to binary or from C++ structures to Python structures.

Given the limited capabilities of embedded systems, it is essential to focus those capabilities on the core functions of the system.

2.5. Storage Devices are Prone to Damage and Slow

Due to size and cost constraints, most embedded devices use storage devices such as Compact Flash, SD, mini SD, and MMC. While these devices have the advantage of not being concerned about mechanical damage, their lifespan is generally short. For example, CF cards can typically only be written to 1 million times, while SD cards can only be written to 100,000 times. For applications like digital cameras, this may be sufficient. However, for applications that require frequent disk writes, such as historical databases, the issue of disk damage will quickly become apparent. For instance, a certain application writes a 16MB file to a CF card daily, using a FAT16 file system with a cluster size of 2K. After writing the 16MB file, the partition table needs to be written 8192 times, meaning that a CF card with a lifespan of 1 million writes can only last 122 days when writing a 16MB file daily, while the vast majority of other areas of the CF card have only been used a fraction of that.

In addition to the static partition table and similar blocks being frequently read and written to, leading to premature damage, some embedded devices also face the challenge of sudden power loss, which can result in incomplete data on storage devices.

2.5.1. Wear Leveling

The basic idea of wear leveling is to use memory blocks evenly across the storage device. A table must be maintained to track the usage of memory blocks, including their offset positions, current availability, and the number of times they have been erased. When a new erase request comes in, blocks are selected based on the following principles:

Preferably contiguous.
Least number of erases.

Even when updating existing data, the above principles should be applied to allocate new blocks. Similarly, the location of this table should not be fixed; otherwise, the block occupied by this table would be the first to fail. When updating this table, the allocation of blocks should also follow the above algorithm.

If there is a large amount of static data on the storage device, the above algorithm can only take effect on the remaining space; in this case, an algorithm for moving these static data must also be implemented. However, this algorithm may reduce write operation performance and increase algorithm complexity. Generally, only dynamic leveling algorithms are used.

Currently, mature wear leveling file systems include JFFS2 and YAFFS. Another approach is to implement wear leveling on traditional file systems like FAT16 by allocating a sufficiently large file in advance and implementing wear leveling algorithms within that file. However, this requires modifying the FAT16 code to disable updates to the last modified time.

Many modern CF and SD cards have already implemented wear leveling internally, eliminating the need for software implementation.

2.5.2. Error Recovery

If a power outage occurs while writing data to storage, the data in the written area may be left in an unknown state. In some applications, this can lead to incomplete files, while in others, it may cause system failure. Therefore, error recovery for such issues is a necessary consideration in embedded software design. Common approaches include two methods:

Log-based file systems.

This type of file system does not store data directly; instead, it logs entries sequentially, allowing recovery to the previous state in the event of a power outage. Representative examples of this type of file system include ext3.

Double backup.

The double backup approach is simpler; all data is written twice, alternating between two copies. The file partition table must also have a double backup. Assuming there is a data block A, A1 is its backup block, and at the initial moment, A1’s content matches that of A. In the partition table, F points to data block A, while F1 points to its backup block. When modifying a file, the content of data block A1 is first modified. If a power outage occurs at this point, the content of A1 becomes incorrect, but since F points to the intact A, the data remains safe. If A1 is successfully modified, then F1’s content is updated. If a power outage occurs at this point, F remains intact, so there is still no issue.

Many modern Flash devices already incorporate error detection and correction technologies that can ensure data integrity during power outages. Additionally, they may include automatic dynamic/static wear leveling algorithms and bad block handling, allowing them to be used as hard drives without additional software considerations. Thus, as hardware advances, software becomes more reliable, and ongoing technological progress allows us to focus more on the software’s functionality itself, which is the trend of development.

2.6. High Cost of Failure

Embedded products are sold to users as a combination of hardware and software, creating a problem that purely software does not face: when a product fails, if it needs to be returned to the factory for repairs, the costs can be very high. Common types of failures in embedded devices include:

a) Data failures. Data cannot be read or is inconsistent due to certain reasons, such as database errors caused by power outages.

b) Software failures. Defects in the software itself that require patch releases or new software versions for correction.

c) System failures. For example, a user downloads the wrong system kernel, causing the system to fail to start.

d) Hardware failures. This type of failure requires factory returns and is outside the scope of our discussion.

For the first three types of failures, it is essential to ensure that customers or on-site technicians can resolve issues independently. From an architectural perspective, the following principles can be referenced:

a) Use data management designs with error recovery capabilities. When data errors occur, the acceptable processing order for users is:

i. Errors are corrected, and all data is valid.

ii. Data (which may be incomplete) that occurred during the error is lost, but previous data remains valid.

iii. All data is lost.

iv. The data engine crashes and cannot continue working.

In general, meeting the second condition is sufficient (logging, transactions, backups, error identification).

b) Separate application programs from the system. Application programs should be placed on pluggable Flash cards, allowing file copying and upgrades through card readers. Avoid using proprietary application software for upgrading application programs unless necessary.

c) Implement a “safe mode.” This means that when the main system is damaged, the device can still start and re-upgrade the system. Common uboot can ensure this, as it allows entering uboot to re-upgrade via tftp when the system is damaged.

3. Software Framework

In desktop systems and network systems, frameworks are commonly applied, such as the well-known ACE, MFC, Ruby on Rails, etc. However, frameworks are rarely used in embedded systems. The reason is likely that embedded systems are considered simple, lacking repetition, and overly focused on functionality implementation and performance optimization. As mentioned in the introduction, the current trend in embedded development is towards increasing complexity, large-scale, and series development. Therefore, designing software frameworks in embedded systems is also necessary and valuable.

3.1. Challenges Faced by Embedded Software Architecture

As previously discussed, embedded system software architecture faces several challenges, one of the most important being the dependency on hardware and the complexity of hardware-related software. There are also stringent requirements for stability and memory usage in embedded software. If everyone in the team is an expert in these areas, it may be possible to develop high-quality software. However, in reality, a team may consist of only one or two senior personnel, while most members are junior engineers. If everyone engages with hardware and is responsible for stability, performance, and other metrics, it is difficult to ensure the final product’s quality. If the component team consists entirely of talents proficient in hardware and other low-level technologies, it becomes challenging to design software that excels in usability and scalability. Each field has its specialization, and the architect’s choices determine the team’s composition.

At the same time, although embedded software development is complex, there are also numerous opportunities for reuse. How to reuse and how to respond to future changes?

Thus, how to shield complexity from the majority, how to separate concerns, and how to ensure the system’s key non-functional indicators are challenges that embedded software architects must address. One possible solution is the software framework.

3.2. What is a Framework?

A framework is a semi-finished software product designed within a given problem domain for reuse and responding to future demand changes. Frameworks emphasize abstraction of specific domains, containing a wealth of domain knowledge, aimed at shortening the software development cycle and improving software quality. Secondary developers using frameworks implement specific functionalities by rewriting subclasses or assembling objects.

3.2.1. Levels of Software Reuse

Reuse is a frequently discussed topic, and the phrase “don’t reinvent the wheel” is a well-known principle. However, there are many levels of understanding of reuse.

The most basic form of reuse is copy-paste. A certain functionality was previously implemented, and when needed again, it can be copied and modified for use. Experienced programmers often have their own code libraries, allowing them to implement faster than new programmers. The drawback of copy-paste is that the code has not undergone abstraction and is often not entirely applicable, requiring modification. After multiple reuses, the code becomes chaotic and difficult to understand. Many companies’ products face this issue, where one product’s code is copied from another and modified for use, sometimes with class names and variable names left unchanged. According to the standard that “only code designed for reuse can truly be reused,” this does not qualify as reuse, or it is considered low-level reuse.

A higher level of reuse is libraries. This involves abstracting frequently used functionalities, extracting constant parts, and providing them to secondary developers in library form. Designing libraries requires high standards from designers, as they do not know how secondary developers will use them. This is the most widely used form of reuse, such as the standard C library and STL library. One of the significant advantages of the increasingly popular Python language is its extensive library support. In contrast, C++ lacks a strong, unified library support, becoming a shortcoming. Summarizing common functionalities and developing them into libraries in internal company development is very valuable, but upgrading libraries can impact many products and must be approached cautiously.

A framework is another form of reuse. Like libraries, frameworks abstract and implement invariant parts of the system, allowing secondary developers to implement the variable parts. The most significant difference between typical frameworks and libraries is that libraries are static and called by secondary developers, while frameworks are dynamic; they are the main controllers, and the secondary developers’ code must conform to the framework’s design, which decides when to call.

For example, a network application always involves establishing connections, sending and receiving data, and closing connections. Provided as a library, it looks like this:

conn = connect(host,port); if(conn.isvalid()) { data = conn.recv(); printf(data); conn.close(); }

As a framework, it looks like this:

class mycomm:class connect { public: host(); port(); onconnected(); ondataarrived(unsigned char* data, int len); onclose(); };

The framework will create a mycomm object at the “appropriate” moment, query the host and port, and then establish the connection. After the connection is established, it calls the onconnected() interface, giving the secondary developer the opportunity to process it. When data arrives, it calls the ondataarrived interface to let the secondary developer handle it. This follows the Hollywood Principle: “Don’t call us, we’ll call you.”

Of course, a complete framework usually also provides various libraries for secondary developers to use. For instance, MFC provides many libraries like CString, but fundamentally, it is a framework. For example, implementing the OnInitDialog interface for a dialog box is defined by the framework.

3.2.2. Abstraction Targeting Highly Specific Domains

Compared to libraries, frameworks provide more targeted abstractions of specific domains. Libraries, such as the C library, are aimed at all applications. In contrast, frameworks are relatively narrower. For example, the framework provided by MFC is suitable for desktop application development on the Windows platform, while ACE is a framework for network application development, and Ruby on Rails is designed for rapid web site development.

The more specific the domain, the stronger the abstraction can be, making secondary development easier because there are more commonalities. The common characteristics of embedded system software development that we discussed earlier are part of this specific domain that can be abstracted. When applied to actual embedded applications, even more commonalities can be abstracted.

The purpose of designing a framework is to summarize the commonalities of a specific domain, implement them in a framework manner, and specify the implementation methods for secondary developers, thereby simplifying development. Correspondingly, a framework developed for one domain cannot serve another domain. For enterprises, a framework is an excellent technical means for accumulating knowledge and reducing costs.

3.2.3. Decoupling and Responding to Changes

A critical goal of framework design is to respond to changes. The essence of responding to changes is decoupling. From the architect’s perspective, decoupling can be divided into three types:

Logical decoupling. This is the abstraction and separation of logically different modules, such as decoupling data from the interface. This is also the most common form of decoupling.
Knowledge decoupling. This is achieved by designing to allow individuals with different knowledge to work solely through interfaces. A typical example is the specialized knowledge held by testing engineers and the programming and implementation knowledge held by development engineers. Traditional testing scripts often merge these two aspects, requiring testing engineers to possess programming skills. Through appropriate means, testing engineers can implement their test cases in the simplest way, while developers write traditional program code to execute these cases.
Decoupling the variable from the invariant. This is an important feature of frameworks. Frameworks analyze domain knowledge, fixing the commonalities, or invariant content, while allowing the variable parts to be implemented by secondary developers.

3.2.4. Frameworks Can Implement and Specify Non-Functional Requirements

Non-functional requirements refer to characteristics such as performance, reliability, testability, and portability. These characteristics can be implemented through frameworks. Below, we will provide examples for each.

Performance. The most taboo aspect of performance optimization is universal optimization. The performance of a system often depends on specific points. For example, in embedded systems, accessing storage devices is relatively slow. If developers are not attentive to this issue and frequently read and write storage devices, performance will degrade. If the framework designs the read and write operations for storage devices, secondary developers can focus on providing and processing data, allowing for frequency adjustments in the framework to optimize performance. Since frameworks are developed separately and widely used, they can fully optimize key performance points.

Reliability. Using the previously mentioned network communication program as an example, since the framework manages the creation and management of connections and handles various possible network errors, the specific implementers do not need to understand this knowledge or implement code for error handling, ensuring the entire system’s reliability in network communication. The significant advantage of designing reliability through a framework is that the code developed by secondary developers runs under the framework’s control. This way, the framework can implement the prone-to-error sections, while also capturing and handling errors generated by secondary developers’ code. Libraries cannot replace users in error handling.

Testability. Testability is an essential aspect that software architecture must consider. The next chapter will discuss how good software design ensures testability. On one hand, frameworks define the interfaces for secondary development, compelling developers to create code that is easy to unit test. On the other hand, frameworks can also provide designs that facilitate automated testing and regression testing at the system testing level, such as a unified TL1 interface.

Portability. If the portability of software is a design goal, framework designers can ensure this during the design phase. One way is to shield system differences through cross-platform libraries; another, more extreme approach is to make secondary development based on frameworks scriptable. Configuration software is an example of this; projects configured on PC can also run on embedded devices.

3.3. An Example of Framework Design

3.3.1. Basic Architecture

3.3.2. Functional Characteristics

The above is an architecture diagram for a product series, characterized by modular hardware that can be plugged and unplugged at any time. Different hardware is applied to different communication testing scenarios, such as optical communication testing, xDSL testing, and Cable Modem testing. Different firmware and software need to be developed for different hardware. The firmware layer’s function mainly involves receiving commands from the software via USB interfaces, reading and writing the corresponding hardware interfaces, conducting calculations, and returning results to the software. The software runs on the WinCE platform, providing a touch-based graphical interface while also offering XML (SOAP) interfaces and TL1 interfaces. To achieve automated testing, it also provides a Lua-based scripting language interface. The entire product series consists of dozens of different hardware modules, requiring the development of dozens of software sets. Although these software serve different hardware, they share a high degree of similarity. Therefore, the optimal choice was to develop a framework first and then develop specific module software based on that framework.

### 3.3.3. Analysis

The software structure can be analyzed as follows:

The system consists of three main parts: software, firmware, and hardware. The software and firmware run on two independent boards, each with its processor and operating system. The hardware is plugged into the board where the firmware resides and is replaceable.

The software and firmware are essentially software, and we analyze them separately.

Software

The main task of the software is to provide various user interfaces, including local graphical interfaces, SOAP access interfaces, and TL1 access interfaces.

The entire software part is divided into five major parts:

Communication layer
Protocol layer
Graphical interface
SOAP server
TL1 server

The communication layer should shield users from understanding specific communication media and protocols, whether USB or socket, without affecting the upper levels. The communication layer is responsible for providing reliable communication services and appropriate error handling. Through configuration files, users can change the communication layer they use.

The protocol layer’s purpose is to encode and decode data. The output of encoding is a stream that can be sent through the communication layer; based on the characteristics of embedded software, we choose binary as the format for the stream. The output of decoding can be varied, including C structs for interface use, XML data, and Lua data structures (tables). If needed, JSON, TL1, Python data, TCL data, and so on can also be generated. This layer is automatically generated through the framework, which we will discuss later.

The in-memory database, SOAP server, and TL1 server are users of the protocol layer. The graphical interface communicates with the in-memory database and the underlying communication.

The graphical interface is one of the key focuses of framework design because it involves the most workload and repetitive tasks.

Let’s analyze what the most important tasks are in graphical interface development.

Collecting user input data and commands
Sending data and commands to the underlying
Receiving feedback from the underlying
Displaying data on the interface

At the same time, some libraries are used to further simplify development:

This is a simplified example but illustrates the characteristics of frameworks well:

Customer code must implement according to specified interfaces
The framework calls the interfaces implemented by customers at appropriate times
Each interface is designed to accomplish a specific single function
Connecting various steps organically is the framework’s responsibility; secondary developers do not need to know or worry about it.
There are usually accompanying libraries.

Firmware

The firmware’s primary role is to accept commands from the software, drive the hardware, retrieve the hardware status, perform calculations, and return the results to the software. Early firmware was a thin layer, as most tasks were performed by hardware, with firmware merely serving as a communication intermediary. However, with the evolution of technology, firmware has begun to undertake more tasks originally performed by hardware.

The entire firmware component is divided into five major parts:

Hardware abstraction layer, providing access interfaces to hardware

Mutually independent task groups

Task/message dispatcher

Protocol layer

Communication layer

For different devices, the workload is concentrated in the hardware abstraction layer and task groups. The hardware abstraction layer is provided as a library, implemented by the engineers most familiar with the hardware. The task groups consist of a series of tasks, each representing different business applications. For example, measuring the error rate. This part is implemented by engineers with relatively less experience, whose main task is to implement specified interfaces according to standardized documentation.

The task defines the following interfaces to be implemented by specific developers:

OnInit(); OnRegisterMessage(); OnMessageArrive(); Run(); OnResultReport();

The framework’s code flow is as follows (pseudo-code):

CTask* task = new CBertTask(); task->OnInit(); task->OnRegisterMessage(); while(TRUE) { task->OnMessageArrive(); task->Run(); task->OnResultReport(); } delete task; task = NULL;

This way, the implementer of specific tasks only needs to focus on implementing these interfaces. Other tasks such as hardware initialization, message sending and receiving, encoding and decoding, and result reporting are handled by the framework. This avoids the need for each engineer to deal with all aspects from top to bottom. Moreover, such task code has high reusability; for instance, implementing the PING algorithm on both Ethernet and Cable Modem is the same.

3.3.4. Practical Effects

In practical projects, frameworks significantly reduce development difficulty. This is especially evident in the software part, where even interns can complete high-quality interface development, shortening the development cycle by over 50%. Product quality has greatly improved. The contribution of the framework to the firmware part is to reduce the need for engineers who are proficient in low-level hardware, allowing general engineers familiar with measurement algorithms to implement it. At the same time, the existence of the framework ensures performance, stability, and testability.

3.4. Common Patterns in Framework Design

3.4.1. Template Method Pattern

The template method pattern is the most commonly used design pattern in frameworks. The fundamental idea is to fix the algorithm in the framework while allowing secondary developers to implement specific operations within that algorithm. For example, the logic for initializing a device in framework code may look like this:

TBool CBaseDevice::Init() { if ( DownloadFPGA() != KErrNone ) { LOG(LOG_ERROR,_L(“Download FPGA fail”)); return EFalse; } if ( InitKeyPad() != KerrNone ) { LOG(LOG_ERROR,_L(“Initialize keypad fail”)); return EFalse; } return ETrue; }

DownloadFPGA and InitKeyPad are virtual functions defined by CBaseDevice; secondary developers create subclasses inheriting from CBaseDevice and specifically implement these two interfaces. The framework defines the order of calls and error handling, so secondary developers do not need to worry about these aspects.

3.4.2. Creational Patterns

Since frameworks often involve the creation of various subclass objects, creational patterns are frequently used. For instance, in a drawing software framework, a base class defines the interface for graphic objects, from which subclasses like ellipse, rectangle, and line can be derived. When a user draws a graphic, the framework needs to instantiate that subclass. Factory methods, prototype methods, and so on can be used here.

class CDrawObj { public: virtual int DrawObjTypeID()=0; virtual Icon GetToolBarIcon()=0; virtual void Draw(Rect rect)=0; virtual CDrawObj* Clone()=0; };

3.4.3. Observer Pattern

The observer pattern is the most commonly used method for decoupling data and interfaces. Interface developers only need to register for the data they need, and when the data changes, the framework will “push” the data to the interface. However, a common issue with the observer pattern is how to handle reentrancy and timeouts in synchronous modes. As framework designers, this issue must be carefully considered. Reentrancy refers to secondary developers executing subscription/unsubscription operations in the message callback function, which can disrupt the message subscription mechanism. Timeouts refer to secondary developers’ message callback functions taking too long to process, preventing other messages from responding. The simplest solution is to use an asynchronous mode, allowing subscribers and data publishers to run in independent processes/threads. If this condition is not met, it must be established as an important convention of the framework to prohibit secondary developers from causing such issues.

3.4.4. Decorator Pattern

The decorator pattern gives frameworks the capability to add functionalities later. The framework defines an abstract base class for decorators, and concrete implementers implement it, dynamically adding it to the framework.

For example, in a game, the graphics rendering engine is an independent module capable of rendering static and dynamic images of characters. If the planner decides to introduce an item called “invisibility cloak,” requiring players wearing this item to display a semi-transparent image on the screen, how should the graphics engine be designed to accommodate this upgrade?

When the invisibility cloak is equipped, a filter should be added to the graphics engine. This is a vastly simplified example; actual game engines are much more complex. The decorator pattern is also commonly used for pre-processing and post-processing of data.

3.5. Disadvantages of Frameworks

A good framework can significantly enhance product development efficiency and quality, but it also has its drawbacks:

Frameworks are generally complex, and designing and implementing a good framework requires considerable time. Therefore, they are only suitable when the framework can be repeatedly applied, as the upfront investment will yield substantial returns.
Frameworks stipulate a series of interfaces and rules, which simplifies secondary development but also requires secondary developers to remember many regulations. Violating these regulations can lead to non-functionality. However, since frameworks shield a large number of domain details, their learning cost is greatly reduced.
Upgrading frameworks can severely impact existing products, necessitating complete regression testing. There are two ways to address this issue: first, rigorously test the framework itself, establishing a comprehensive unit testing library and developing sample projects to test all framework functionalities; second, use static linking to prevent existing products from easily following upgrades. Of course, having robust regression testing measures for existing products is even better.
Performance loss. Since frameworks abstract the system, they increase the system’s complexity. Techniques like polymorphism generally reduce system performance. However, overall, frameworks can maintain system performance at a relatively high level.

4. Automated Code Generation

4.1. Let Machines Do What They Can

Laziness is a virtue for programmers and even more so for architects. The process of software development is about telling machines how to do things. If a task can be accomplished by machines, there is no need for humans to do it. Machines are not only tireless but also never make mistakes. Our job is to automate the clients’ work, and with a little more thought, we can also partially automate our own work. Extremely patient programmers can be good, but they can also be bad.

Well-designed systems often exhibit many highly similar and strongly patterned codes. Undesigned systems may produce many different implementations for the same functionality. The previous discussion on framework design has already demonstrated this. Sometimes we go further, analyzing the patterns within these similar codes, describing these functionalities with formatted data, and letting machines generate the code.

4.2. Examples

4.2.1. Message Encoding and Decoding

In the earlier framework example, we can see that the message encoding and decoding parts have been isolated, decoupled from other parts. Given their characteristics, they are very suitable for further “regulation” and machine code generation.

Encoding is merely the process of streaming data structures, while decoding is the reverse. For encoding, the code is essentially as follows (binary protocol):

stream &lt;&lt; a.i; stream &lt;&lt; a.j; stream &lt;&lt; a.object;

(To simplify, it is assumed that a stream object has been designed that can stream various data types and has handled issues such as byte order conversion.)

Finally, we obtain a stream. Are you accustomed to writing this kind of code? However, this code does not reflect any creativity from engineers; we already know that there are i, j, and an object. Why should we type this code ourselves? If we analyze the definition of a, can we automatically generate such code?

struct dataA { int i; int j; struct dataB object; };

With a simple semantic analyzer parsing this code, we can easily generate the streaming code for all data structures. Such an analyzer can be implemented in Python or other languages with strong string processing capabilities in about 200 lines. The tree of data types is similar to the following:

By traversing this tree, we can generate streaming code for all data structures.

In the previously mentioned framework project, the automatically generated message encoding and decoding code for a hardware module amounted to over 30,000 lines, equivalent to a small software application. Since it was automatically generated, there were no errors, providing high reliability to the upper layer.

We can also use XML or other formats to define data structures, thereby generating automatic code. Depending on the requirements, any type of language, such as C++/Java/Python, can be used. If strong validation is desired, XSD can be used to define data structures. There is a commercial product, xBinder, which is very expensive and difficult to use, but it is not as good as developing it ourselves. (Why is it difficult to use? Because it is too general.) In addition to encoding in binary format, we can also generate code for other readable formats, such as XML. This way, communication uses binary while debugging uses XML, achieving the best of both worlds. Generating binary code may look like this:

Xmlbuilder.addelement(“i”,a.i); Xmlbuilder.addelement(“j”,a.j); Xmlbuilder.addelement(“object”,a.object);

This too is well-suited for machine generation. A similar approach can be used to enable embedded script support. We won’t elaborate on that here. (The biggest issue with embedded script support is exchanging data between C/C++ and scripts, which involves a lot of similar code concerning data types.)

Recently, Google released its protocol buffer, which exemplifies this concept.

4.2.2. GUI Code

In the framework design section, we mentioned that the framework has no power over interface data collection and updates, only abstracting interfaces for programmers to implement. However, let’s look at what these interface programmers do (the code has been simplified and can be considered pseudo-code):

void onDataArrive(CDataBinder&amp; data) { m_biterror.setText(“%d”,data.biterror); m_signallevel.setText(“%d”,data.signallevel”); m_latency.setText(“%d”,data.latency”); }

Void onCollectData(CDataBinder&amp; data) { data.biterror = atoi(m_biterror.getText()); data. signallevel = atoi(m_ signallevel.getText()); data. latency = atoi(m_ latency.getText()); }

Is this code interesting? Let’s think about what we can do? (XML describes the interface, but it is very challenging for complex logic.)

4.2.3. Summary

From the software architecture process, it is essential to follow general principles, aiming to independently implement various functional parts to achieve high cohesion and low coupling. We must identify the highly repetitive, strongly patterned code within the system, further regulate and formalize it, and ultimately let machines generate this code. Currently, the most successful application of this approach is in message encoding and decoding. While automating GUI code generation has certain limitations, it can still be applied. Everyone should be adept at discovering such possibilities in their work to reduce workload and improve efficiency.

4.2.4. Google Protocol Buffer

Google’s recently released Protocol Buffer is a prime example of automated code generation.

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the