Essential Insights into Embedded System Software Architecture Design

Click on the above“Big Fish Robot”, select“Top/Star Public Account”

Welfare essentials, delivered at the first time!

Organizer: Embedded Cloud IOT Technology Circle, Author: veryarm

1. Introduction

Embedded systems are a branch of software design, and their many characteristics determine the choices of system architects. At the same time, some of their issues have considerable generalizability and can be extended to other fields.

When it comes to embedded software design, the traditional impression is microcontrollers, assembly language, and a high dependence on hardware. Traditional embedded software developers often focus only on implementing the functionality itself, neglecting factors such as code reuse, data and interface separation, and testability. This leads to embedded software quality being highly dependent on the developer’s level, with success or failure resting solely on them. With the rapid development of embedded hardware and software, today’s embedded systems have greatly improved in functionality, scale, and complexity. For example, Marvell’s PXA3xx series has reached a maximum clock frequency of 800Mhz, built-in USB, WIFI, 2D graphics acceleration, and 32-bit DDR memory. In terms of hardware, today’s embedded systems have reached or even exceeded the PC platforms of a few years ago. On the software side, mature operating systems have already matured, such as Symbian, Linux, and WinCE. Based on these mature operating systems, various applications such as word processing, graphics, video, audio, gaming, and web browsing have emerged, with functionality and complexity comparable to PC software. Some commercial equipment companies that previously relied on dedicated hardware and systems have begun to change their thinking, using software to replace the functions previously achieved through proprietary hardware, thus achieving lower costs and greater changeability and maintainability.

2. Factors Determining Architecture and Its Impact

Architecture is not an isolated technical product; it is influenced by multiple factors. At the same time, an architecture affects many aspects of software development.

Here is a specific example.

The motorcycle engine must pass a series of tests before leaving the factory. On the assembly line, the engine is sent to each workstation, where workers conduct tests on aspects such as speed, noise, and vibration. The requirement is to implement an embedded device with the following basic functions:

Installed at the workstation, workers turn it on and log in before starting work.
Automatically collect test data through sensors and display it on the screen.
Record all test results and provide statistical functions, such as defect rates.

If you are the architect of this device, what issues should you focus on when designing the architecture?

2.1. Common Misunderstandings

2.1.1. Small Systems Do Not Need Architecture

There are many embedded systems that are relatively small and are generally designed for specific purposes. Influenced by the engineer’s understanding, customer scale, and project progress, architecture design is often neglected, and coding is done directly with the goal of achieving functionality. This behavior superficially meets the demands of progress, cost, and functionality, but in the long run, the costs incurred in expansion and maintenance far exceed the initial savings. If the original developer of the system continues to stay within the organization and is responsible for the project, then everything may go smoothly. Once he leaves, subsequent developers may introduce more errors due to insufficient understanding of system details. Note that the cost of changes in embedded systems is far higher than that of general software systems. A good software architecture can describe the system from both macro and micro perspectives and isolate various parts, making the addition of new features and subsequent maintenance relatively simple.

Take the example of a subway card machine that has appeared in previous courses. A simple subway card machine only needs to implement the following functions:

A while loop is enough to implement this system, and coding and debugging can start directly. But from an architect’s perspective, is there anything worth abstracting and isolating here?

Billing system. The billing system must be abstracted, for example, from single billing to mileage-based billing.
Sensor system. Sensors include magnetic card sensors, coin acceptors, etc. The devices may be replaced.
Error handling and recovery. Considering the high reliability and short fault recovery time, this part needs to be designed separately.

Possible future changes in requirements:

User interface. Should a dedicated model be abstracted out? To prepare for future implementation of the view.
Data statistics. Should a relational database be introduced?

If coding is done directly according to the process diagram above, how much code can be reused when changes occur?

However, do not over-design as a result. The architecture should be based on meeting current needs while appropriately considering reuse and changes.

2.1.2. Agile Development Does Not Require Architecture

Extreme programming and agile development have led some people to mistakenly believe that software development no longer requires architecture. This is a significant misunderstanding. Agile development was proposed as a solution after the obvious shortcomings of the traditional waterfall development process, so it inevitably has a higher starting point and stricter requirements for development, rather than regressing to the Stone Age. In fact, architecture is part of agile development; however, agile development recommends using more efficient and simpler methods for design in form. For example, drawing UML diagrams on a whiteboard and then photographing them with a digital camera; using user stories instead of use cases, etc. Test-driven agile development forces engineers to design the functionality and interfaces of components before writing actual code rather than starting to write code directly. Some characteristics of agile development include:

Targeting systems larger than traditional development processes
Acknowledging changes and iterating architecture
Simplicity without chaos
Emphasizing testing and refactoring

2. Embedded Environment Software Design Characteristics

To discuss the architecture of embedded software, it is first necessary to understand the characteristics of embedded software design.

2.1. Closely Related to Hardware

Embedded software generally has considerable dependence on hardware. This is reflected in several aspects:

Some functions can only be implemented through hardware; software operates hardware and drives hardware.
Hardware differences/changes can have a significant impact on software.
Without hardware or with incomplete hardware, software cannot run or cannot run completely.

These characteristics lead to several consequences:

The understanding and proficiency of software engineers regarding hardware largely determine the performance/stability and other non-functional indicators of the software, which is generally relatively complex and requires experienced engineers to ensure quality.
Software is highly dependent on hardware design, cannot maintain relative stability, and has poor maintainability and reusability.
Software cannot be tested and verified independently of hardware; it often requires synchronization with hardware verification, leading to loose progress at the front and tight progress at the back, with an expanded scope for error localization.

To address these issues, several solutions can be considered:

Use software to implement hardware functions. Choose more powerful processors to implement some hardware functions with software, which can reduce dependence on hardware and is beneficial for responding to changes and avoiding dependence on specific models and manufacturers. This has become a trend in some industries. The PC platform has also experienced such a process, for example, early Chinese character cards.
Independently abstract the dependence on hardware into a hardware abstraction layer, making the other parts of the software as hardware-independent as possible and allowing it to run independently of hardware. On the one hand, this controls the risks of hardware changes or replacements within limited scope; on the other hand, it improves the testability of the software part.

2.2. High Stability Requirements

Most embedded software has high requirements for the long-term stable operation of programs. For example, mobile phones are often powered on for months, and communication devices require 24*7 normal operation; even communication testing devices require at least 8 hours of normal operation. To achieve stability, several commonly used design techniques are:

Distributing different tasks across independent processes. Good modular design is key.
Watch Dog, Heartbeat, restart failed processes.
A complete and unified logging system for quick problem localization. Embedded devices generally lack powerful debuggers, so logging systems are especially important.
Isolating errors within the smallest scope to avoid the spread and chain reaction of errors. Core code must undergo thorough verification; for non-core code, it can run in monitoring or sandbox modes to avoid damaging the entire system.

For example, GPRS access on Symbian is affected by different hardware and operating system versions, and functionality is not very stable. In one version, the system crashes whenever the GPRS connection is closed, and it is a known issue. By isolating GPRS connections, HTTP protocol processing, file downloads, and other operations into one process, although this process crashes after each operation, it does not affect the user.

Double backup techniques are rarely used.

2.3. Insufficient Memory

Although today’s embedded systems have significantly improved memory compared to the K-count era, the problem of insufficient memory still plagues system architects as software scales increase. There are several principles that architects can refer to when making design decisions:

2.3.1. Virtual Memory Technology

Some embedded devices need to handle huge amounts of data, and this data cannot all be loaded into memory. Some embedded operating systems do not provide virtual memory technology; for example, WinCE4.2 allows each program to use a maximum of 32M of memory. For such applications, architects should specifically design their own virtual memory technology. The core of virtual memory technology is to move data that is unlikely to be used out of memory temporarily. This involves several technical points:

Reference counting; data in use cannot be moved out.
Prediction; predicting the likelihood of use of certain data in the next stage. Based on predictions, data can be moved out or loaded in advance.
Placeholder data/objects.
Cache. Cache frequently used data under complex data results for direct access.
Fast persistence and loading.

The following diagram is a schematic diagram of a national telecommunications machine room management system interface:

Each node has a large amount of data that needs to be loaded, and the above technologies can be used to minimize memory usage.

2.3.2. Two-Stage Construction

In systems with limited memory, handling object construction failures is a necessary issue. The most common reason for failure is insufficient memory (this is also a requirement for PC platforms, but is often overlooked in practice because memory is simply too cheap). Two-stage construction is a commonly used and effective design. For example:

CMySimpleClass: class CMySimpleClass { public: CMySimpleClass(); ~CMySimpleClass(); ... private: int SomeData; }; CMyCompoundClass: class CMyCompoundClass { public: CMyCompoundClass(); ~CMyCompoundClass(); ... private: CMySimpleClass* iSimpleClass; }; In the constructor of CMyCompoundClass, initialize the iSimpleClass object. CMyCompoundClass::CMyCompoundClass() { iSimpleClass = new CMySimpleClass; }

What happens when creating CMyCompoundClass?

CMyCompoundClass* myCompoundClass = new CMyCompoundClass;

Memory is allocated for the CMyCompoundClass object.
The constructor of the CMyCompoundClass object is called.
An instance of CMySimpleClass is created in the constructor.
The constructor ends and returns.

Everything seems to be simple, but what if an error occurs due to insufficient memory when creating the CMySimpleClass object in the third step? The constructor cannot return any error information to prompt the caller that the construction was unsuccessful. The caller then receives a pointer to CMyCompoundClass, but this object is not fully constructed.

What if an exception is thrown in the constructor? This is a famous nightmare because the destructor will not be called, and if resources were allocated before creating the CMySimpleClass object, there will be a leak. There could be a one-hour discussion on throwing exceptions in constructors, but one suggestion is to avoid throwing exceptions in constructors as much as possible.

Thus, using two-stage construction is a better choice. In simple terms, it means avoiding any actions that might produce errors in the constructor, such as memory allocation, and placing these actions in another function called after construction is complete. For example:

AddressBook* book = new AddressBook() If(!book-&gt;Construct()) { delete book; book = NULL; }

This ensures that when Construct fails, already allocated resources are released.

Two-stage construction is commonly used in the most important mobile operating system, Symbian.

2.3.3. Memory Allocators

Different systems have different characteristics in memory allocation. Some require a lot of small memory allocations, while others frequently need to grow already allocated memory. A good memory allocator can be crucial for the performance of embedded software. It should be ensured that the entire system uses a unified memory allocator that can be replaced at any time.

2.3.4. Memory Leaks

Memory leaks are very serious for embedded systems with limited memory. By using their own memory allocator, it is easy to track memory allocation and release situations, thus detecting memory leaks.

2.4. Limited Processor Capability, High Performance Requirements

This discussion does not cover real-time systems, which is a large professional topic. For general embedded systems, due to limited processor capabilities, special attention must be paid to performance issues. Some very good architectural designs can lead to the failure of an entire project due to not meeting performance requirements.

2.4.1. Resist the Temptation of New Technologies

Architects must understand that new technologies often mean complexity and lower performance. Even if this is not absolute, due to the limitations of embedded system hardware performance, flexibility is low. Once new technologies are found to differ from initial expectations, it becomes more challenging to adapt through modifications. For example, GWT technology. This is Google’s Ajax development tool that allows programmers to develop web Ajax programs as if they were developing desktop applications. This makes it easy to implement remote and local operation interfaces with a single code base on embedded systems. However, running B-S structured applications on embedded devices poses a significant performance challenge. At the same time, issues with browser compatibility are also severe, and the current version of GWT is not yet mature enough.

It has been proven that remote control solutions for embedded systems still need to adopt ActiveX, VNC, or other solutions.

2.4.2. Avoid Too Many Layers

Layered structures are beneficial for clearly defining system responsibilities and achieving decoupling, but each additional layer means a performance loss. Especially when large amounts of data need to be transmitted between layers. For embedded systems, when adopting layered structures, the number of layers should be controlled, and large amounts of data should not be transmitted, especially between layers in different processes. If data transmission is necessary, it is essential to avoid large-scale data format conversions, such as XML to binary and C++ structures to Python structures.

Embedded systems have limited capabilities, and it is essential to focus those limited capabilities on the core functionality of the system.

2.5. Storage Devices are Prone to Damage and Slow

Due to volume and cost constraints, most embedded devices use storage devices such as Compact Flash, SD, mini SD, and MMC. Although these devices have the advantage of not worrying about mechanical movement damage, their lifespan is relatively short. For example, CF cards can typically only be written to 1 million times, while SD cards are even shorter, at only 100,000 times. For applications such as digital cameras, this may be sufficient. However, for applications that require frequent erasure and writing of disks, such as historical databases, the issue of disk damage becomes apparent quickly. For example, if an application writes a 16M file to a CF card every day, and the file system is FAT16 with a cluster size of 2K, then after writing this 16M file, the partition table needs to be written 8192 times. Thus, a CF card with a lifespan of 1 million times can actually work for only 1000000/8192 = 122 days. When damaged, most other areas of the CF card may have only been used a fraction of that.

In addition to static file partition tables and other blocks being frequently read and written and damaged prematurely, some embedded devices also face the challenge of sudden power loss, which can create incomplete data on storage devices.

2.5.1. Wear Leveling

The basic idea of wear leveling is to use all blocks of the storage device evenly. It is necessary to maintain a table of the usage of storage blocks, which includes the offset position of the block, whether it is currently available, and how many times it has been erased. When there is a new erase request, the following principles should be followed in selecting blocks:

As continuous as possible.
Least number of erase times.

Even when updating existing data, the above principles will allocate new blocks. Similarly, the location of this table must also not be fixed; otherwise, the block occupied by this table will be the first to be damaged. When updating this table, the same allocation algorithm should be used.

If there is a large amount of static data on the storage device, then the above algorithm can only be effective for the remaining space. In this case, algorithms for moving these static data must also be implemented. However, this can reduce the performance of write operations and increase the complexity of the algorithm. Generally, only dynamic balancing algorithms are used.

Currently, relatively mature wear leveling file systems include JFFS2 and YAFFS. Another approach is to implement wear leveling on traditional file systems such as FAT16, as long as a sufficiently large file is pre-allocated, and the wear leveling algorithm is implemented within that file. However, this requires modifying the FAT16 code to disable the update of the last modified time.

Some modern CF and SD cards have already implemented wear leveling internally, in which case no software implementation is needed.

2.5.2. Error Recovery

If data is being written to storage when a power loss occurs or the device is unplugged, the data in the written area may be in an unknown state. In some applications, this can lead to incomplete files, while in others, it can cause system failures. Therefore, error recovery for such errors is also a necessary consideration in embedded software design. Common approaches include two:

Log-based file systems

This type of file system does not directly store data but rather logs it sequentially, so when a power loss occurs, it can always recover to the previous state. Representative examples of this type of file system include ext3.

Double backup

The double backup approach is simpler; all data is written twice. Each time, one copy is used. The file partition table must also be double-backed. Assuming there is a data block A, A1 is its backup block, and at the initial moment, A1’s content is consistent with A’s. In the partition table, F points to data block A, and F1 is its backup block. When modifying files, the content of data block A1 is modified first. If a power loss occurs at this point, the content of A1 is incorrect, but because F points to the intact A, the data is not damaged. If A1 is modified successfully, then the content of F1 is modified. If a power loss occurs at this point, F is intact, so there is still no problem.

Modern flash devices have built-in error detection and correction technologies that can ensure data integrity during power loss. Some also include automatic dynamic/static wear leveling algorithms and bad block handling, which require no additional software handling and can be used like hard drives. Therefore, as hardware becomes more advanced, software will become more reliable, and continuous technological advancements will allow us to focus more on software functionality itself, which is the trend of development.

2.6. High Costs of Failures

Embedded products are sold to users together with hardware and software, which brings a problem not encountered with pure software: when a product fails, if it needs to be returned to the factory for repair, the cost is high. Common types of failures in embedded devices include:

a) Data failures. Errors that prevent data from being read or lead to inconsistencies due to certain reasons, such as database errors caused by power loss.

b) Software failures. Defects in the software itself that need to be corrected through patches or new software versions.

c) System failures. For example, if a user downloads the wrong system kernel, the system fails to boot.

d) Hardware failures. This type of failure can only be repaired by returning to the factory and is not within our discussion scope.

For the first three types of failures, it is necessary to ensure that customers or on-site technicians can resolve them. From an architectural perspective, the following principles can be referenced:

a) Use data management designs with error recovery capabilities. When data errors occur, the acceptable processing by the user is as follows:

i. Errors are corrected, and all data is valid.

ii. Data (which may be incomplete) is lost when an error occurs, but previous data remains valid.

iii. All data is lost.

iv. The data engine crashes and cannot continue to work.

Generally, meeting the second condition is sufficient. (Logging, transactions, backups, error recognition)

b) Separate applications from systems. Applications should be placed on pluggable flash cards that can be upgraded through file copying via card readers. Avoid using dedicated application software to upgrade applications unless necessary.

c) There should be a “safe mode.” That is, even if the main system is damaged, the device can still boot and upgrade the system again. Common uboot can ensure this; when the system is damaged, it can enter uboot and upgrade through tftp.

3. Software Framework

Frameworks are widely used in desktop systems and network systems, such as the famous ACE, MFC, Ruby On Rails, etc. However, frameworks are rarely used in embedded systems. The reason is that embedded systems are thought to be simple, lacking repetition, and overly focused on functionality implementation and performance optimization. As mentioned in the introduction, the current trend in embedded development is towards complexity, large-scale, and serialization. Therefore, designing software frameworks in embedded systems is also necessary and valuable.

3.1. Problems Faced by Embedded Software Architecture

Previously, we discussed some issues faced by embedded system software architecture, one of which is the dependence on hardware and the complexity of hardware-related software. Additionally, embedded software has strict requirements in terms of stability and memory usage. If everyone in the team is an expert in these areas, it may be possible to develop high-quality software. However, in reality, there may only be one or two senior personnel in a team, while most others are junior engineers. If everyone is dealing with hardware and responsible for stability, performance, and other metrics, it is challenging to guarantee the final product quality. If the component team consists of talents proficient in hardware and other underlying technologies, it becomes difficult to design software that excels in usability and scalability. Specialization is essential, and the architect’s choices determine the team’s composition.

At the same time, although embedded software development is complex, there are also many possibilities for reuse. How to reuse and how to respond to future changes?

Therefore, how to shield complexity from most people, how to separate concerns, and how to ensure key non-functional indicators of the system are issues that embedded software architecture designers should solve. One possible solution is software frameworks.

3.2. What is a Framework?

A framework is a semi-finished software product designed for reuse and to respond to future demand changes within a given problem domain. Frameworks emphasize abstraction of specific domains and contain a wealth of specialized domain knowledge to shorten the software development cycle and improve software quality. Secondary developers using frameworks implement specific functionalities by rewriting subclasses or assembling objects.

3.2.1. Levels of Software Reuse

Reuse is a topic we often discuss, and the adage “Don’t reinvent the wheel” is also a familiar admonition. However, there are many levels of understanding reuse.

The most basic form of reuse is copy-paste. If a certain function has been implemented before, it can be copied and modified for reuse. Experienced programmers generally have their libraries, allowing them to implement faster than new programmers. The downside of copy-paste is that the code has not been abstracted and often does not fully apply, requiring modification. After multiple reuses, the code can become chaotic and difficult to understand. Many companies have this issue where the code of one product is copied from another product, modified slightly, and sometimes even class names and variable names are not changed. According to the standard that “only code designed for reuse can truly be reused,” this does not count as reuse, or it is considered low-level reuse.

A higher level of reuse is libraries. This involves abstracting and extracting the constant parts of frequently used functionalities to provide them as libraries for secondary developers. Designing libraries requires high demands on designers because they do not know how secondary developers will use them. This is the most widely used form of reuse, such as the standard C library and STL library. One of the significant advantages of the popular Python language is its extensive library support, while C++ has always lacked strong unified library support, becoming a shortcoming. Summarizing commonly used functionalities and developing them into libraries is very valuable in internal development, but the downside is that upgrading libraries can affect many products, so it must be approached cautiously.

Frameworks represent another form of reuse. Like libraries, frameworks also abstract and implement the unchanging parts of a system, allowing secondary developers to implement other changing parts. The most significant difference between typical frameworks and libraries is that libraries are static and called by secondary developers, while frameworks are dynamic; they are the main controllers, and the code of secondary developers must conform to the framework’s design, determining when to call.

For example, a web application always involves establishing connections, sending and receiving data, and closing connections. Provided as a library, it looks like this:

conn = connect(host, port); if(conn.isvalid()) { data = conn.recv(); printf(data); conn.close(); }

In a framework, it looks like this:

class mycomm: class connect { public: host(); port(); onconnected(); ondataarrived(unsigned char* data, int len); onclose(); };

The framework will create the mycomm object at the “appropriate” time, query the host and port, and then establish the connection. After the connection is established, the onconnected() interface will be called to provide secondary developers with the opportunity to handle it. When data arrives, the ondataarrived interface will be called to let secondary developers handle it. This follows the Hollywood principle: “Don’t call us, we will call you.”

Of course, a complete framework usually also provides various libraries for secondary developers to use. For example, MFC provides many libraries, such as CString, but fundamentally, it is a framework. For example, implementing the OnInitDialog interface for a dialog box is prescribed by the framework.

3.2.2. Abstraction for Highly Specific Domains

Compared to libraries, frameworks are more targeted abstractions for specific domains. Libraries, such as the C library, are applicable to all applications. In contrast, frameworks are relatively narrower. For example, the framework provided by MFC is only suitable for desktop application development on the Windows platform, ACE is a framework for network application development, and Ruby on Rails is designed for rapid web site development.

The more specific the domain, the stronger the abstraction can be, and the easier secondary development can be since there is more commonality. For example, the many characteristics of embedded system software development we discussed are commonalities of specific domains that can be abstracted. When it comes to actual embedded applications, even more commonalities can be abstracted.

The purpose of framework design is to summarize the commonalities of specific domains, implement them in a framework manner, and specify the implementation methods for secondary developers, thus simplifying development. Accordingly, frameworks developed for one domain cannot serve another.

For enterprises, frameworks are an excellent technical means to accumulate knowledge and reduce costs.

3.2.3. Decoupling and Responding to Changes

One important goal of framework design is to respond to changes. The essence of responding to changes is decoupling. From an architect’s perspective, decoupling can be divided into three types:

Logical decoupling. Logical decoupling is abstracting and separating logically different modules for processing, such as decoupling data and interfaces. This is also the most common form of decoupling we perform.
Knowledge decoupling. Knowledge decoupling is designing so that people with different knowledge can work solely through interfaces. A typical example is the professional knowledge possessed by testing engineers and the program design and implementation knowledge possessed by development engineers. Traditional testing scripts usually combine these two into one, requiring testing engineers to possess programming skills. Through appropriate means, testing engineers can implement their test cases in the simplest way, while developers write traditional program code to execute these cases.
Decoupling change from stability. This is a crucial feature of frameworks. Frameworks analyze domain knowledge to fix commonalities, i.e., unchanging content, while leaving potentially changing parts to secondary developers to implement.

3.2.4. Frameworks Can Implement and Specify Non-Functional Requirements

Non-functional requirements refer to characteristics such as performance, reliability, testability, and portability. These characteristics can be achieved through frameworks. Below, we will provide examples for each:

Performance. The most taboo thing for performance optimization is universal optimization. The performance of a system often depends on specific points. For example, in embedded systems, accessing storage devices is relatively slow. If developers do not pay attention to this issue and frequently read and write storage devices, performance will decline. If the read and write operations for storage devices are designed by the framework, secondary developers act merely as data providers and processors, then the framework can adjust the frequency of reads and writes to optimize performance. Since frameworks are developed separately and widely used, they can fully optimize key performance points.

Reliability. Taking the above network communication program as an example, since the framework is responsible for connection creation and management and handles various possible network errors, the specific implementer does not need to understand this knowledge or implement error handling code in this area, ensuring the overall system’s reliability in network communication. The most significant advantage of designing reliability in a framework is that the code of secondary developers runs under the control of the framework. On the one hand, the framework can implement error-prone parts, and on the other hand, it can capture and handle errors generated by secondary developers’ code. Libraries, on the other hand, cannot replace users in handling errors.

Testability. Testability is an important aspect that software architecture needs to consider. The following sections will discuss that the testability of software is guaranteed by excellent design. On the one hand, since the framework specifies the interfaces for secondary development, it forces secondary developers to produce code that is convenient for unit testing. On the other hand, the framework can also provide designs that facilitate automated testing and regression testing at the system testing level, such as a unified TL1 interface.

Portability. If the portability of software is a design goal, framework designers can ensure this during the design phase. One way is to shield system differences through cross-platform libraries, while another more extreme possibility is that secondary development based on the framework can be scripted. Configuration software is an example in this regard; engineering configured on a PC can also run on embedded devices.

3.3. An Example of Framework Design

3.3.1. Basic Architecture

3.3.2. Functional Characteristics

The above is a schematic diagram of the architecture of a product series, characterized by modular hardware that can be plugged in and out at any time. Different hardware is applied in different communication testing scenarios, such as optical communication testing, xDSL testing, Cable Modem testing, etc. Different firmware and software need to be developed for different hardware. The firmware layer’s functions mainly involve receiving instructions from the software through USB interfaces, reading and writing the corresponding hardware interfaces, performing some calculations, and returning results to the software. The software runs on the WinCE platform, providing a touch-based graphical interface and offering external interfaces based on XML (SOAP) and TL1. For automated testing, it also provides an interface based on the Lua scripting language. The entire product series has dozens of different hardware modules, each requiring the development of dozens of software sets. Although these software serve different hardware, they share a high degree of similarity. Therefore, choosing to develop a framework first and then develop specific module software based on the framework became the optimal choice.

### 3.3.3. Analysis

The structure analysis of the software part is as follows:

The system is divided into three major parts: software, firmware, and hardware. Software and firmware run on independent boards, each with its processor and operating system. The hardware is plugged into the board where the firmware is located and can be replaced.

Both software and firmware are software; let’s analyze them separately.

Software

The main job of the software is to provide various user interfaces, including local graphical interfaces, SOAP access interfaces, and TL1 access interfaces.

The entire software part is divided into five major parts:

Communication layer
Protocol layer
Graphical interface
SOAP server
TL1 server

The communication layer must shield users from understanding specific communication media and protocols, whether USB or socket, without affecting the upper layers. The communication layer is responsible for providing reliable communication services and appropriate error handling. Through configuration files, users can change the communication layer used.

The purpose of the protocol layer is to encode and decode data. The output of encoding is a stream that can be sent through the communication layer. Based on the characteristics of embedded software, we choose binary as the format for the stream. The output of decoding can be diverse, including C Structs for interface use, XML data, and Lua data structures (tablegt). If needed, it can also produce JSON, TL1, Python data, TCL data, etc. This layer is automatically generated by machines, which we will discuss later.

The memory database, SOAP Server, and TL1 Server are all users of the protocol layer. The graphical interface reads and writes to the memory database and the underlying communication.

The graphical interface is one of the key focuses of framework design, as this is where the most work is done, and the most repetitive and tedious tasks exist.

Let’s analyze what the most important tasks in graphical interface development are.

Collecting user input data and commands
Sending data and commands to the underlying layer
Receiving feedback from the underlying layer
Displaying data on the interface

At the same time, there are some libraries to further simplify development:

This is a simplified example, but it illustrates the characteristics of the framework well:

Customer code must implement according to specified interfaces
The framework calls the interfaces implemented by customers at appropriate times
Each interface is designed to accomplish only a specific single function
The framework is responsible for organically linking each step; secondary developers do not need to know or be concerned about this
There are usually accompanying libraries.

Firmware

The main job of the firmware is to receive commands from the software, drive hardware operations, and obtain hardware status, returning calculated results to the software. Early firmware was a thin layer because the vast majority of work was done by hardware, and firmware served only as a communication intermediary. With the evolution of time, today’s firmware is beginning to take on more and more tasks originally handled by hardware.

The entire firmware part is divided into five major parts:

Hardware abstraction layer, providing access interfaces to hardware

Mutually independent task groups

Task/message dispatcher

Protocol layer

Communication layer

For different devices, the workload is concentrated in the hardware abstraction layer and task groups. The hardware abstraction layer is provided as a library and implemented by engineers most familiar with the hardware. The task groups consist of a series of tasks representing different business applications, such as measuring the error rate. This part is implemented by engineers with relatively less experience, whose main job is to implement specified interfaces according to standardized documentation.

Tasks define the following interface, which is implemented by specific developers:

OnInit(); OnRegisterMessage(); OnMessageArrive(); Run(); OnResultReport();

The code flow of the framework is as follows (pseudo code):

CTask* task = new CBertTask(); task-&gt;OnInit(); task-&gt;OnRegisterMessage(); while(TRUE) { task-&gt;OnMessageArrive(); task-&gt;Run(); task-&gt;OnResultReport(); } delete task; task = NULL;

Thus, the implementers of specific tasks only need to focus on implementing these few interfaces. Other matters such as hardware initialization, message sending and receiving, encoding and decoding, result reporting, etc., are handled by the framework. This avoids every engineer having to deal with all aspects from top to bottom. Moreover, such task codes have high reusability; for example, implementing the PING algorithm on Ethernet or Cable Modem is the same.

3.3.4. Actual Effects

In actual projects, frameworks significantly reduce development difficulty. This is especially evident in the software part, where even interns can complete high-quality interface development, shortening the development cycle by over 50%. Product quality has greatly improved. The contribution to the firmware part lies in reducing the need for engineers proficient in underlying hardware, as general engineers familiar with measurement algorithms can suffice. At the same time, the existence of the framework ensures performance, stability, testability, and other factors.

3.4. Common Patterns in Framework Design

3.4.1. Template Method Pattern

The template method pattern is the most commonly used design pattern in frameworks. Its fundamental idea is to fix the algorithm by the framework while leaving specific operations in the algorithm to be implemented by secondary developers. For example, the logic of device initialization in framework code is as follows:

TBool CBaseDevice::Init() { if ( DownloadFPGA() != KErrNone ) { LOG(LOG_ERROR,_L(“Download FPGA fail”)); return EFalse; } if ( InitKeyPad() != KerrNone ) { LOG(LOG_ERROR,_L(“Initialize keypad fail”)); return EFalse; } return ETrue; }

DownloadFPGA and InitKeyPad are both virtual functions defined by CBaseDevice, and secondary developers create subclasses that inherit from CBaseDevice to implement these two interfaces. The framework defines the order of calls and error handling, and secondary developers have no need to concern themselves with or decide this.

3.4.2. Creational Patterns

Since frameworks often involve creating various subclass objects, creational patterns are frequently used. For example, in a drawing software framework, a base class defines the interface for graphic objects, which can be derived into various subclasses such as ellipses, rectangles, and straight lines. When a user draws a graphic, the framework must instantiate the subclass. Factory methods, prototype methods, etc., can be used here.

class CDrawObj { public: virtual int DrawObjTypeID()=0; virtual Icon GetToolBarIcon()=0; virtual void Draw(Rect rect)=0; virtual CDrawObj* Clone()=0; };

3.4.3. Message Subscription Pattern

The message subscription pattern is the most commonly used method for separating data and interfaces. Interface developers only need to register for the data they need, and when the data changes, the framework will “push” the data to the interface. The most common problem with the message subscription pattern is how to handle reentrancy and timeouts in synchronous modes. As framework designers, this issue must be carefully considered. Reentrancy refers to secondary developers performing subscription/unsubscription operations in the callback functions of messages, which can disrupt the message subscription mechanism. Timeouts refer to the situation where the callback function of secondary developers takes too long to process, causing other messages to be unresponsive. The simplest solution is to use an asynchronous mode, allowing subscribers and data publishers to run in separate processes/threads. If this condition is not met, it must be established as an essential agreement of the framework to prohibit secondary developers from creating such issues.

3.4.4. Decorator Pattern

The decorator pattern gives the framework the ability to add functionality later. The framework defines an abstract base class for decorators, while specific implementers implement it, dynamically adding it to the framework.

For instance, in a game, the graphics rendering engine is an independent module that can draw still images, running figures, etc. If the designers decide to introduce an item called “invisibility cloak” in the game, requiring players wearing this item to display a semi-transparent image, how should the graphics engine be designed to adapt to this game upgrade?

When the invisibility cloak is equipped, a filter is added to the graphics engine. This is a highly simplified example; the actual game engine is much more complex. The decorator pattern is also commonly used for pre-processing and post-processing of data.

3.5. Disadvantages of Frameworks

A good framework can significantly improve the efficiency and quality of product development, but it also has its drawbacks.

Frameworks are generally quite complex, and designing and implementing a good framework requires considerable time. Therefore, generally, frameworks are suitable only when they can be applied repeatedly; this is when the upfront investment will yield substantial returns.
Frameworks prescribe a series of interfaces and rules, which simplifies secondary development work but also requires secondary developers to remember many regulations. If these regulations are violated, normal operation cannot be guaranteed. However, since frameworks shield a large number of domain details, the learning cost is significantly reduced.
Upgrading frameworks can severely impact existing products, requiring complete regression testing. There are two ways to address this issue. The first is to conduct strict testing on the framework itself, and it may be necessary to establish a comprehensive unit testing library while developing sample projects to test all functionalities of the framework. The second is to use static linking, which prevents existing products from easily following upgrades. Of course, if existing products have good regression testing methods, that is even better.
Performance loss. Since frameworks abstract the system, they increase the system’s complexity. Techniques such as polymorphism commonly lead to a general decrease in system performance. However, overall, frameworks can ensure that system performance remains at a relatively high level.

4. Automatic Code Generation

4.1. Let Machines Do What They Can

Laziness is a virtue of programmers and also of architects. The software development process is essentially telling machines how to do things. If a task can be done by machines, it should not be done by humans. Machines not only do not tire but also never make mistakes. Our job is to automate the work of customers, and by thinking a little more, we can also partially automate our own work. Programmers with great patience are good, but they can also be bad.

Well-designed systems often exhibit many highly similar and strongly patterned codes. Poorly designed systems may produce many different implementations for the same type of functionality. The previous section on framework design has already demonstrated this. Sometimes, we go a step further, analyzing the patterns in these similar codes and using formatted data to describe these functionalities, allowing machines to generate the code.

4.2. Example

4.2.1. Encoding and Decoding of Messages

In the framework example above, we can see that the message encoding and decoding part has been isolated, making it uncoupled from other parts. Given its characteristics, it is very suitable for further “regularization” and having machines generate code.

Encoding is simply streamlining data structures; decoding is the reverse. For encoding, the code is essentially like this (binary protocol):

stream &lt;&lt; a.i; stream &lt;&lt; a.j; stream &lt;&lt; a.object;

(To simplify, we assume that a stream object has been designed that can stream various data types and has handled issues such as byte order conversion.)

Finally, we obtain a stream. Have you all become accustomed to writing this kind of code? However, this type of code does not reflect any creativity from the engineer; we already know that there is an i, a j, and an object. Why do we need to type this code ourselves? If we analyze the definition of a, can we automatically generate such code?

struct dataA { int i; int j; struct dataB object; };

All that is needed is a simple semantic analyzer to parse this code and obtain a tree about the data type, allowing easy generation of the streaming code. Such an analyzer can be implemented in about two hundred lines of Python or other languages with strong string processing capabilities. The tree about data types is similar to the following diagram:

By traversing this tree, we can generate all the streaming code for data structures.

In the previous framework example, the message encoding and decoding code generated for a hardware module reached as high as 30,000 lines, which is almost equivalent to a small software project. Since it was generated automatically, there were no errors, providing high reliability to the upper layer.

We can also define data structures using XML or other formats to generate automatic code. Depending on the need, any type can be generated, such as C++/Java/Python. If strong checks are desired, XSD can be used to define data structures. There is a commercial product called xBinder, which is very expensive and difficult to use, and it is not as good as developing one’s own. (Why is it difficult to use? Because it is too general.) In addition to encoding in binary format, we can also generate code for other readable formats, such as XML. Thus, communication can use binary while debugging can use XML, achieving the best of both worlds. The code generated for binary might look like this:

Xmlbuilder.addelement(“i”,a.i); Xmlbuilder.addelement(“j”,a.j); Xmlbuilder.addelement(“object”,a.object);

This also lends itself well to machine generation. The same idea can be applied to enabling embedded script support in software. We won’t go into more detail here. (The biggest issue with embedded script support is exchanging data between C/C++ and scripts, which involves a lot of similar code regarding data types.)

Recently, Google released its protocol buffer, which is a paradigm of this thinking. Currently, it supports C++/Python and is expected to support more languages soon, so everyone can pay attention to it. From now on, you should no longer manually write encoding and decoding mechanisms.

4.2.2. GUI Code

In the framework design section above, we mentioned that the framework could not handle data collection and update from the interface, so it only abstracts out interfaces for programmers to implement. However, let’s take a look at what these interface programmers do. (The code has been simplified and can be considered pseudo code).

void onDataArrive(CDataBinder&amp; data) { m_biterror.setText(“%d”,data.biterror); m_signallevel.setText(“%d”,data.signallevel”); m_latency.setText(“%d”,data.latency”); }

Void onCollectData(CDataBinder&amp; data) { data.biterror = atoi(m_biterror.getText()); data. signallevel = atoi(m_ signallevel.getText()); data. latency = atoi(m_ latency.getText()); }

Is this code interesting? Let’s think about what we can do. (XML describes the interface, but it is difficult for complex logic).

4.2.3. Summary

From this, it can be seen that in the process of software architecture, we should first follow general principles, try to separate various functional parts, achieve high cohesion and low coupling, and then discover the highly repetitive and strongly patterned code in the system, further regularizing and formalizing them, and finally letting machines generate this code. Currently, the most successful application of this is message encoding and decoding. Automating the generation of interface code has certain limitations, but it can still be applied. Everyone should be keen to discover such possibilities in their work to reduce workload and improve work efficiency.

4.2.4. Google Protocol Buffer

The Protocol Buffer just released by Google is a paradigm of automatic code generation.

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data—think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the