Designing Software Architecture for Embedded Systems

Organized by: Embedded Cloud IOT Technology Circle, Author: veryarm

1. Introduction

Embedded systems are a branch of software design, and their many characteristics determine the choices of system architects. At the same time, some of their issues are quite general and can be promoted to other fields.

When it comes to embedded software design, the traditional impression is microcontrollers, assembly, and a high dependency on hardware. Traditional embedded software developers often focus only on implementing functionality itself, neglecting factors such as code reuse, data and interface separation, and testability. This leads to the quality of embedded software being highly dependent on the developer’s level, with success or failure resting on one person. With the rapid development of embedded hardware and software, today’s embedded systems have greatly improved in functionality, scale, and complexity. For example, Marvell’s PXA3xx series has reached a maximum clock speed of 800Mhz, with built-in USB, WIFI, 2D graphics acceleration, and 32-bit DDR memory. In terms of hardware, today’s embedded systems have reached or even exceeded PC platforms from several years ago. On the software side, mature operating systems have emerged, such as Symbian, Linux, and WinCE. Based on these mature operating systems, various applications such as word processing, image, video, audio, games, and web browsing have emerged, with functionality and complexity rivaling that of PC software. Some commercial equipment companies that used to rely on dedicated hardware and systems have begun to change their thinking, replacing functions previously implemented with proprietary hardware with software solutions based on excellent and inexpensive hardware and mature operating systems, achieving lower costs and higher flexibility and maintainability.

2. Factors Determining Architecture and its Impact

Architecture is not an isolated technical product; it is influenced by multiple factors. At the same time, an architecture also impacts many aspects of software development.

Here is a specific example.

The motorcycle engine must pass a series of tests before leaving the factory. On the assembly line, the engine is sent to each workstation, where workers conduct tests on aspects such as speed, noise, and vibration. The requirement is to implement an embedded device with the following basic functions:

  1. Installed at the workstation, the worker turns it on and logs in before starting work.

  2. Automatically collect test data through sensors and display it on the screen.

  3. Record all test results and provide statistical functions, such as defect rates.

If you are the architect of this device, what issues should you focus on when designing the architecture?

2.1. Common Misunderstandings

2.1.1. Small systems do not need architecture

Many embedded systems are relatively small, generally designed for specific purposes. Due to the influence of engineers’ understanding, customer scale, and project schedule, architecture design is often neglected, and coding is directly aimed at implementing functionality. This behavior superficially meets the needs of schedule, cost, and functionality, but in the long run, the costs incurred in scalability and maintenance far exceed the initial savings. If the original developer of the system continues to stay in the organization and is responsible for the project, everything may go well. Once they leave, subsequent developers may introduce more errors due to insufficient understanding of the system details. It should be noted that the cost of changes in embedded systems is far higher than that in general software systems. A good software architecture can describe the system from both macro and micro levels and isolate various parts, making the addition of new features and subsequent maintenance relatively simple.

Take the example of a city rail card reader, which has been mentioned in previous courses. A simple city rail card reader only needs to implement the following functions:

A while loop is sufficient to implement this system, and coding and debugging can begin immediately. But from an architect’s perspective, is there anything here worth abstracting and separating?

  1. Billing system. The billing system must be abstracted, such as from single billing to mileage billing.

  2. Sensor system. Sensors include magnetic card readers, coin acceptors, etc. The equipment may change.

  3. Error handling and recovery. Given the high reliability and short recovery time required, this part needs to be designed separately.

Future possible changes in requirements:

  1. User interface. Should a dedicated model be abstracted for future implementation of the view?

  2. Data statistics. Should a relational database be introduced?

If coding is done directly according to the flowchart above, how much code can be reused when changes occur?

However, do not let this lead to over-design. The architecture should be based on meeting current needs while considering reuse and changes appropriately.

2.1.2. Agile development does not require architecture

Extreme programming and agile development have led some people to mistakenly believe that software development no longer requires architecture. This is a significant misunderstanding. Agile development was proposed as a solution after the obvious disadvantages of traditional waterfall development processes, so it inevitably has a higher starting point and stricter requirements for development rather than regressing to the Stone Age. In fact, architecture is part of agile development; it simply recommends using more efficient and simpler methods for design. For example, drawing UML diagrams on a whiteboard and then photographing them with a digital camera; using user stories instead of use cases, etc. Test-driven agile development forces engineers to design the functionality and interfaces of components before writing actual code, rather than starting to write code directly. Some characteristics of agile development include:

  1. Targeting larger systems than traditional development processes

  2. Acknowledging changes and iterating architecture

  3. Simplicity without chaos

  4. Emphasizing testing and refactoring

2. Embedded Environment Software Design Characteristics

To discuss embedded software architecture, one must first understand the characteristics of embedded software design.

2.1. Closely Related to Hardware

Embedded software generally has a considerable dependence on hardware. This is reflected in several aspects:

  1. Some functions can only be implemented through hardware; software operates and drives hardware.

  2. Differences/changes in hardware can have a significant impact on software.

  3. Without hardware or with incomplete hardware, software cannot run or cannot run completely.

These characteristics lead to several consequences:

  1. The understanding and proficiency of software engineers regarding hardware largely determine software performance/stability and other non-functional indicators, which are often relatively complex and require senior engineers to ensure quality.

  2. Software is highly dependent on hardware design, cannot maintain relative stability, and has poor maintainability and reusability.

  3. Software cannot be tested and validated independently of hardware, often requiring synchronization with hardware validation, causing delays in progress and expanding the range of error localization.

To address these issues, several solutions can be considered:

  1. Implement hardware functions through software. Choose more powerful processors and use software to implement some hardware functions, which can reduce dependence on hardware and be beneficial in responding to changes and avoiding dependence on specific models and manufacturers. This has become a trend in some industries. A similar process has also been experienced on PC platforms, such as early Chinese character cards.

  2. Independently create a hardware abstraction layer to minimize hardware dependence in other parts of the software, allowing it to run independently of hardware. This can control the risks of hardware changes or replacements within a limited scope and improve the testability of software parts.

2.2. High Stability Requirements

Most embedded software has high requirements for long-term stable operation of programs. For example, mobile phones can often be powered on for months, and communication devices require 24*7 normal operation, even testing devices for communication require at least 8 hours of normal operation. To achieve stability, some commonly used design techniques include:

  1. Distributing different tasks across independent processes. Good modular design is key.

  2. Watchdog timers, heartbeats, and restarting failed processes.

  3. A comprehensive and unified logging system for quick problem localization. Embedded devices generally lack powerful debuggers, so logging systems are particularly important.

  4. Isolating errors to the smallest extent to avoid the spread and chain reactions of errors. Core code should undergo thorough verification, while non-core code can run in monitored or sandbox environments to prevent it from damaging the entire system.

For example, GPRS access on Symbian is affected by different hardware and operating system versions, leading to functionality that is not very stable. In one version, closing the GPRS connection would always crash the system, which was a known issue. By isolating GPRS connections, HTTP protocol processing, file downloads, etc., into one process, although this process crashes after each operation, it does not affect the user.

  1. Double backup methods are rarely used.

2.3. Insufficient Memory

Although today’s embedded systems have significantly improved memory compared to the era of kilobytes, the issue of insufficient memory still troubles system architects as software scales grow. Here are some principles that architects can refer to when making design decisions:

2.3.1. Virtual Memory Technology

Some embedded devices need to handle massive data, and this data cannot all be loaded into memory. Some embedded operating systems do not provide virtual memory technology, such as WinCE4.2, where each program can use a maximum of 32M memory. For such applications, architects should design their virtual memory technology. The core of virtual memory technology is to move data that is unlikely to be used temporarily out of memory. This involves several technical points:

  1. Reference counting; data being used cannot be moved out.

  2. Using predictions to anticipate the likelihood of data being used in the next phase. Based on predictions, data can be moved out or loaded in advance.

  3. Placeholder data/objects.

  4. High-speed caching. Cache frequently used data directly under complex data results.

  5. Fast persistence and loading.

The following image is a schematic diagram of a national telecom machine room management system interface:

Each node has a large amount of data to be loaded, and the above techniques can be used to minimize memory usage.

2.3.2. Two-Stage Construction

In systems with limited memory, handling object construction failures is a necessary issue. The most common reason for failure is insufficient memory (this requirement also applies to PC platforms, but is often overlooked since memory is cheap). Two-stage construction is a commonly used and effective design. For example:

CMySimpleClass:class CMySimpleClass { public: CMySimpleClass(); ~CMySimpleClass(); ... private: int SomeData; }; CMyCompoundClass:class CMyCompoundClass { public: CMyCompoundClass(); ~CMyCompoundClass(); ... private: CMySimpleClass* iSimpleClass; }; In CMyCompoundClass's constructor, initialize the iSimpleClass object. CMyCompoundClass::CMyCompoundClass() { iSimpleClass = new CMySimpleClass; }

What happens when creating CMyCompoundClass?

CMyCompoundClass* myCompoundClass = new CMyCompoundClass;

  1. Allocate memory for the CMyCompoundClass object

  2. Call the constructor of the CMyCompoundClass object

  3. Create an instance of CMySimpleClass in the constructor

  4. The constructor ends and returns

Everything seems straightforward, but what if an out-of-memory error occurs during the third step when creating the CMySimpleClass object? The constructor cannot return any error information to indicate that the construction was unsuccessful. The caller then gets a pointer to CMyCompoundClass, but this object has not been fully constructed.

What if an exception is thrown in the constructor? This is a well-known nightmare because the destructor will not be called, and if resources were allocated before creating the CMySimpleClass object, they would leak. Regarding throwing exceptions in constructors, a suggestion is to avoid throwing exceptions in constructors as much as possible.

Therefore, using a two-stage construction method is a better choice. Simply put, avoid any actions that may produce errors, such as memory allocation, in the constructor, and place these actions in a separate function after the construction is complete. For example:

AddressBook* book = new AddressBook(); If(!book->Construct()) { delete book; book = NULL; }

This ensures that when Construct fails, already allocated resources are released.

In the most important mobile operating system, Symbian, the two-stage construction method is widely used.

2.3.3. Memory Allocators

Different systems have different characteristics regarding memory allocation. Some require allocating many small memories, while others frequently need to grow already allocated memory. A good memory allocator can sometimes have a significant impact on the performance of embedded software. It should be ensured that the entire system uses a unified memory allocator, which can be replaced at any time.

2.3.4. Memory Leaks

Memory leaks are very serious for embedded systems with limited memory. By using their memory allocators, it becomes easy to track memory allocation and release conditions, thus detecting memory leaks.

2.4. Limited Processor Capability, High Performance Requirements

This section does not discuss real-time systems, which is a large specialized topic. For general embedded systems, due to limited processor capabilities, performance issues must be particularly noted. Some excellent architectural designs fail due to not meeting performance requirements, ultimately leading to project failure.

2.4.1. Resisting the Temptation of New Technologies

Architects must understand that new technologies often mean complexity and lower performance. Even if this is not absolute, due to the limitations of embedded system hardware performance, flexibility is low. Once it is found that new technology differs from the initial expectations, it becomes even more difficult to adapt through modifications. For example, GWT technology. This is a Google-developed Ajax development tool that allows programmers to develop Web Ajax programs as easily as developing desktop applications. This makes it very easy to implement remote and local operation interfaces on embedded systems using a single codebase. However, running B-S structure applications on embedded devices poses a significant performance challenge. At the same time, issues regarding browser compatibility are also severe, and the current version of GWT is still not mature enough.

It has been proven that embedded remote control solutions still need to adopt Activex, VNC, or other solutions.

2.4.2. Avoiding Too Many Layers

Layered structures are beneficial for clearly defining system responsibilities and achieving system decoupling, but every additional layer implies a performance loss. Especially when large amounts of data need to be transmitted between layers. For embedded systems, when adopting layered structures, it is important to control the number of layers and avoid passing large amounts of data, especially between layers in different processes. If data must be transmitted, avoid large data format conversions, such as XML to binary, C++ structures to Python structures.

Embedded systems have limited capabilities, and it is crucial to focus those limited capabilities on the core functionalities of the system.

2.5. Storage Devices are Prone to Damage and Slow

Due to size and cost constraints, most embedded devices use storage devices such as Compact Flash, SD, mini SD, and MMC. These devices have the advantage of not worrying about mechanical movement damage, but their lifespan is relatively short. For example, CF cards can generally only be written 1 million times, while SD cards are even shorter, with only 100,000 times. For applications like digital cameras, this may be sufficient. However, for applications that require frequent disk erasure, such as historical databases, the issue of disk damage will quickly become apparent. For example, consider an application that writes a 16M file to a CF card daily. If the file system is FAT16, and each cluster size is 2K, then after writing this 16M file, the partition table needs to be written 8192 times, meaning that a CF card with a lifespan of 1 million writes can actually work for 1000000/8192 = 122 days. Meanwhile, the vast majority of other areas on the CF card will have only a fraction of that usage count.

In addition to the static file partition tables and other blocks being frequently read and written, leading to premature damage, some embedded devices also face the challenge of sudden power outages, which can result in incomplete data on storage devices.

2.5.1. Wear Leveling

The basic idea of wear leveling is to evenly use all blocks on the storage device. A usage table for memory blocks needs to be maintained, which includes the offset position of the block, current availability, and the number of times it has been erased. When there is a new erase request, the following principles should be followed:

  1. As continuous as possible

  2. Erase the least number of times

Even when updating existing data, the above principles will be used to allocate new blocks. Similarly, the location of this table cannot be fixed; otherwise, the block it occupies will be the first to wear out. When updating this table, the same algorithm should be used to allocate blocks.

If there are a lot of static data on the storage device, then the above algorithm can only take effect on the remaining space; in this case, an algorithm for moving these static data must also be implemented. However, this algorithm may reduce write operation performance and increase algorithm complexity. Generally, only dynamic balancing algorithms are used.

Currently, mature wear leveling file systems include JFFS2 and YAFFS. Another approach is to implement wear leveling on traditional file systems like FAT16 by pre-allocating a sufficiently large file and implementing wear leveling algorithms within that file. However, this requires modifying FAT16 code to disable last modification time updates.

Some modern CF and SD cards have already implemented wear leveling internally, in which case no software implementation is required.

2.5.2. Error Recovery

If a power outage or disconnection occurs while writing data to the storage device, the data in the written area will be in an unknown state. In some applications, this can lead to incomplete files, while in others, it can cause system failure. Therefore, recovery from such errors is also a necessary consideration in embedded software design. Common approaches include two methods:

  1. Log-based file systems

This type of file system does not directly store data but logs it one by one, so when a power outage occurs, it can always recover to the previous state. Representative examples of such file systems include ext3.

  1. Double backup

The double backup approach is simpler; all data is written twice, alternating between the two. The file partition table must also be double-backed. For example, if there is a data block A, A1 is its backup block, and at the initial moment, the content of A1 is consistent with that of A. In the partition table, F points to data block A, and F1 is its backup block. When modifying a file, the content of A1 is modified first. If a power outage occurs at this point, the content of A1 may be incorrect, but since F points to the intact A, the data remains undamaged. If A1 is modified successfully, then F1’s content is modified. If a power outage occurs at this point, since F is intact, there is still no issue.

Modern Flash devices often come with built-in error detection and correction technologies that can ensure data integrity during power outages. Additionally, they may include automatic dynamic/static wear leveling algorithms and bad block management, requiring no additional software intervention, allowing them to be used like hard drives. Therefore, as hardware becomes more advanced, software becomes more reliable, and continuous technological advancements will allow us to focus more on the software’s functionality itself, which is the trend of development.

2.6. High Costs of Failure

Embedded products are sold to users as combined hardware and software, which brings about an issue not faced by purely software products: when a product fails, if it needs to be returned for repair, the cost is high. Common types of failures in embedded devices include:

a) Data failures. Data cannot be read or is inconsistent due to certain reasons, such as database errors caused by power outages.

b) Software failures. Defects in the software itself that need to be corrected through patch releases or new software versions.

c) System failures. For example, a user downloads an incorrect system kernel, causing the system to fail to start.

d) Hardware failures. This type of failure requires a return for repair and is not within our discussion scope.

For the first three types of failures, it is essential to ensure that customers or on-site technicians can resolve them. From an architectural perspective, the following principles can be referenced:

a) Use data management designs that have error recovery capabilities. When data errors occur, the acceptable handling for users is as follows:

i. Errors are corrected, and all data is valid.

ii. Data (which may be incomplete) lost when an error occurs, with previous data remaining valid.

iii. All data is lost.

iv. The data engine crashes and cannot continue to work.

In general, meeting the second condition is sufficient. (Logging, transactions, backups, error identification)

b) Separate applications from the system. Applications should be placed on pluggable Flash cards and can be upgraded through file copying using card readers. Avoid using proprietary application software to upgrade applications unless necessary.

c) There should be a “safe mode.” That is, even if the main system is damaged, the device can still start and re-upgrade the system. Commonly used U-Boot can ensure this; if the system is damaged, it can enter U-Boot to re-upgrade via TFTP.

3. Software Framework

In desktop and network systems, frameworks are widely used, such as the well-known ACE, MFC, Ruby On Rails, etc. However, frameworks are rarely used in embedded systems. The reason is that embedded systems are considered simple, lacking repetition, and overly focused on functional implementation and performance optimization. As mentioned in the introduction, the current trend of embedded development is towards complexity, large-scale, and series development. Therefore, designing software frameworks in embedded systems is also very necessary and valuable.

3.1. Issues Faced by Embedded Software Architecture

Earlier, we discussed some issues faced by embedded system software architecture, one of which is the dependence on hardware and the complexity of hardware-related software. This also includes the stringent requirements for embedded software in terms of stability and memory usage. If everyone in the team is skilled in these areas, it may be possible to develop high-quality software. However, in reality, there may only be one or two senior personnel in a team, while most are junior engineers. If everyone is dealing with hardware and responsible for stability, performance, and other metrics, it is difficult to guarantee the quality of the final product. If the component team consists of talents proficient in hardware and other low-level technologies, it becomes difficult to design software that excels in usability and scalability. Specialization is essential; the architect’s choices determine the composition of the team.

At the same time, although embedded software development is complex, there are also many possibilities for reuse. How to reuse and how to cope with future changes?

Therefore, how to shield complexity from most people, how to separate concerns, and how to ensure the key non-functional indicators of the system are problems that embedded software architecture designers need to solve. One possible solution is software frameworks.

3.2. What is a Framework

A framework is a semi-finished software product designed for reuse and to respond to future changes in a given problem domain. Frameworks emphasize abstraction in specific domains and contain a wealth of domain knowledge, aiming to shorten the software development cycle and improve software quality. Secondary developers using the framework implement special functionality by rewriting subclasses or assembling objects.

3.2.1. Levels of Software Reuse

Reuse is a topic we often discuss, and the phrase “don’t reinvent the wheel” is also well-known. However, there are many levels of understanding regarding reuse.

The most basic form of reuse is copy-pasting. A function that has been implemented before is copied over when needed and modified slightly for use. Experienced programmers often have their own libraries, allowing them to implement features faster than new programmers. The downside of copy-pasting is that the code has not been abstracted and often does not apply completely, requiring modification, leading to confusion and making the code difficult to understand after multiple reuses. Many companies’ products face this issue, where the code from one product is copied to another product, modified slightly, and sometimes even the class names and variable names are not changed. According to the standard that “only code designed for reuse can truly be reused,” this does not count as reuse, or it is low-level reuse.

A higher level of reuse is libraries. This form requires abstracting frequently used functionalities and extracting the constant parts to provide them as a library for secondary developers. Because the library designer does not know how secondary developers will use it, this places high demands on the designer. This is the most widely used form of reuse, such as the standard C library and STL library. One of the significant advantages of the increasingly popular Python language is its extensive library support. In contrast, C++ has always lacked a powerful and unified library support, which has become a shortcoming. Summarizing commonly used functions within company development and developing them into libraries is very valuable; however, the downside is that upgrading libraries can affect many products and must be approached with caution.

Frameworks represent another form of reuse. Like libraries, frameworks also abstract and implement the invariant parts of the system, allowing secondary developers to implement the varying parts. The most significant difference between typical frameworks and libraries is that libraries are static and called by secondary developers, while frameworks are dynamic and control the flow, requiring secondary developers’ code to align with the framework’s design.

For example, a network application always involves establishing connections, sending and receiving data, and closing connections. The library form would look like this:

conn = connect(host,port); if(conn.isvalid()) { data = conn.recv(); printf(data); conn.close(); }

In contrast, the framework would look like this:

class mycomm:class connect { public: host(); port(); onconnected(); ondataarrived(unsigned char* data, int len); onclose(); }; 

The framework will create the mycomm object at the “appropriate” time, query the host and port, and then establish the connection. After the connection is established, it calls the onconnected() interface, allowing the secondary developer to handle it. When data arrives, the ondataarrived interface is called for the secondary developer to process. This follows the Hollywood principle: “Don’t call us, we’ll call you.”

Of course, a complete framework usually also provides various libraries for secondary developers to use. For example, MFC provides many libraries, such as CString, but fundamentally, it is a framework. For instance, implementing the OnInitDialog interface for a dialog box is dictated by the framework.

3.2.2. Abstraction for Highly Specific Domains

Compared to libraries, frameworks are more specific to targeted domains. Libraries, such as the C library, cater to all applications. In contrast, frameworks are relatively narrower. For instance, the framework provided by MFC is only suitable for Windows platform desktop application development, ACE is targeted at network application development, and Ruby On Rails is designed for rapid web site development.

The more specific the domain, the stronger the abstraction can be, and the simpler the secondary development can be, as the commonalities increase. For example, the various characteristics of embedded system software development we discussed earlier represent commonalities that can be abstracted. When it comes to specific embedded applications, there will be even more commonalities to abstract.

The purpose of framework design is to summarize the commonalities of specific domains and implement them in a framework manner, stipulating the implementation approach for secondary developers, thereby simplifying development. Correspondingly, a framework developed for one domain cannot serve another domain. For enterprises, frameworks are an excellent means of accumulating knowledge and reducing costs.

3.2.3. Decoupling and Coping with Change

A critical goal of framework design is to cope with change. The essence of coping with change is decoupling. From an architect’s perspective, decoupling can be divided into three types:

  1. Logical decoupling. Logical decoupling involves abstracting and separating logically different modules, such as separating data from the interface. This is also the most common form of decoupling.

  2. Knowledge decoupling. Knowledge decoupling is achieved by designing interfaces that allow people with different knowledge domains to work together. A typical example is the knowledge of testing engineers and the programming and implementation knowledge of development engineers. Traditional testing scripts often merge these two roles. Therefore, testing engineers must also possess programming skills. By using appropriate methods, testing engineers can implement their test cases in the simplest way, while developers write traditional program code to execute those cases.

  3. Decoupling change from stability. This is a significant feature of frameworks. Frameworks analyze domain knowledge to fix the common, invariant content while allowing the parts that may change to be implemented by secondary developers.

3.2.4. Frameworks Can Implement and Specify Non-Functional Requirements

Non-functional requirements refer to characteristics such as performance, reliability, testability, and portability. These traits can be achieved through frameworks. Below, we will provide examples for each.

Performance. The most taboo aspect of performance optimization is universal optimization. The performance of a system often depends on specific points. For example, in embedded systems, accessing storage devices is relatively slow. If developers do not pay attention to this issue and frequently read and write storage devices, it will lead to performance degradation. If the framework is responsible for reading and writing to storage devices, secondary developers only need to provide and process data, allowing for adjustment of read/write frequencies within the framework to optimize performance. Since frameworks are developed separately and widely used, they can thoroughly optimize critical performance points.

Reliability. Taking the example of the network communication program above, since the framework manages connection creation and management, as well as handling various possible network errors, specific implementers do not need to understand this area of knowledge or implement error handling code, ensuring the system’s reliability in network communication. The most significant advantage of designing reliability in a framework is that the code of secondary developers runs under the control of the framework. On one hand, the framework can implement error-prone parts, and on the other, it can capture and handle errors generated by secondary developers’ code. Libraries cannot replace users in handling errors.

Testability. Testability is an essential aspect that software architecture needs to consider. The following chapters will discuss how good design ensures software testability. On one hand, frameworks impose interfaces on secondary developers, compelling them to develop code that is conducive to unit testing. On the other hand, frameworks can also provide designs that facilitate automated testing and regression testing at the system testing level, such as a unified TL1 interface.

Portability. If the portability of software is a design goal, framework designers can ensure this during the design phase. One way is to shield system differences through cross-platform libraries, while another extreme approach is to have secondary development based on scripting. Configuration software is an example in this regard, where a project configured on a PC can also run on an embedded device.

3.3. An Example of Framework Design

3.3.1. Basic Architecture

3.3.2. Functional Characteristics

The above is the architecture diagram of a product series, characterized by modular hardware that can be plugged and unplugged at any time. Different hardware is applied to different communication testing scenarios, such as optical communication testing, xDSL testing, Cable Modem testing, etc. Different firmware and software need to be developed for different hardware. The firmware layer’s function is mainly to receive commands from the software via USB and read/write the corresponding hardware interfaces, performing some calculations before returning results to the software. The software runs on the WinCE platform, providing a touch-based graphical interface and offering XML (SOAP) and TL1 interfaces externally. To achieve automated testing, it also provides an interface based on the Lua scripting language. The entire product series has dozens of different hardware modules, and corresponding software needs to be developed for each. Although these software services different hardware, they share a high degree of similarity. Therefore, choosing to develop a framework first and then develop specific module software based on that framework became the optimal choice.

### 3.3.3. Analysis

The structure of the software part is as follows:

The system is divided into three main parts: software, firmware, and hardware. Software and firmware run on two independent boards, each with its processor and operating system. The hardware is plugged into the board where the firmware is located and is replaceable.

Both software and firmware are software; below, we will analyze them separately.

Software

The main job of the software is to provide various user interfaces, including local graphical interfaces, SOAP access interfaces, and TL1 access interfaces.

The entire software part is divided into five major components:

  • Communication Layer
  • Protocol Layer
  • Graphical Interface
  • SOAP Server
  • TL1 Server

The communication layer shields users from having to understand the specific communication medium and protocol, whether it is USB or socket, without affecting the upper layer. The communication layer is responsible for providing reliable communication services and appropriate error handling. Through configuration files, users can change the communication layer being used.

The protocol layer’s purpose is to encode and decode data. The output of the encoding is a stream that can be sent through the communication layer; based on the characteristics of embedded software, we choose binary as the format for the stream. The output of the decoding can vary, including C Structs for interface use, XML data, or Lua data structures (tablegt). If needed, JSON, TL1, Python data, TCL data, etc., can also be generated. This layer is automatically generated through the framework, which we will discuss later.

The in-memory database, SOAP Server, and TL1 Server are all users of the protocol layer. The graphical interface communicates with the in-memory database and the underlying communication.

The graphical interface is one of the key focuses of framework design because it involves the most work and repetitive tasks.

Let’s analyze what the most critical tasks are in graphical interface development.

  1. Collect user input data and commands

  2. Send data and commands to the lower layer

  3. Receive feedback from the lower layer

  4. Display data on the interface

At the same time, there are some libraries used to further simplify development:

This is a simplified example but illustrates the characteristics of frameworks:

  1. Client code must implement according to specified interfaces

  2. The framework calls the client-implemented interfaces at appropriate times

  3. Each interface is designed to perform a specific single function

  4. Connecting each step organically is the framework’s job; secondary developers do not need to know or care.

  5. There are usually accompanying libraries.

Firmware

The primary job of the firmware is to accept commands from the software, drive hardware, acquire hardware status, perform certain calculations, and return results to the software. Early firmware was a thin layer since most of the work was performed by hardware, and firmware only served as a communication intermediary. With the evolution of technology, modern firmware is beginning to take on more of the work originally handled by hardware.

The entire firmware section is divided into five major parts:

Hardware abstraction layer, providing access interfaces to the hardware

Independent task groups

Task/message dispatcher

Protocol layer

Communication layer

For different devices, the workload is concentrated in the hardware abstraction layer and task groups. The hardware abstraction layer is provided as a library, implemented by engineers most familiar with the hardware. The task groups consist of a series of tasks representing different business applications, such as measuring the error rate. This part is implemented by relatively less experienced engineers, whose primary work is to implement specified interfaces and algorithms defined in standardized documentation.

Tasks define the following interfaces for specific developers to implement:

OnInit(); OnRegisterMessage(); OnMessageArrive(); Run(); OnResultReport();

The code flow of the framework is as follows (pseudo code):

CTask* task = new CBertTask(); task->OnInit(); task->OnRegisterMessage(); while(TRUE) { task->OnMessageArrive(); task->Run(); task->OnResultReport(); } delete task; task = NULL;

Thus, the implementers of specific tasks only need to focus on implementing these several interfaces. Other tasks, such as hardware initialization, message sending/receiving, encoding/decoding, and result reporting, are handled by the framework. This avoids the necessity for every engineer to manage all aspects from top to bottom. Moreover, such task code has high reusability, for instance, implementing the algorithm for PING over Ethernet or Cable Modem would be the same.

3.3.4. Actual Effects

In actual projects, the framework significantly reduces development difficulty. This is especially evident in the software part, where even interns can complete high-quality interface development, reducing development cycles by over 50%. Product quality has greatly improved. The contribution to the firmware part lies in reducing the need for engineers proficient in low-level hardware; general engineers familiar with measurement algorithms can suffice. At the same time, the existence of the framework ensures elements such as performance, stability, and testability.

3.4. Common Patterns in Framework Design

3.4.1. Template Method Pattern

The template method pattern is the most commonly used design pattern in frameworks. The fundamental idea is to fix the algorithm in the framework while allowing secondary developers to implement specific operations within that algorithm. For example, the logic for initializing a device in framework code is as follows:

TBool CBaseDevice::Init() { if ( DownloadFPGA() != KErrNone ) { LOG(LOG_ERROR,_L(“Download FPGA fail”)); return EFalse; } if ( InitKeyPad() != KerrNone ) { LOG(LOG_ERROR,_L(“Initialize keypad fail”)); return EFalse; } return ETrue; }

DownloadFPGA and InitKeyPad are both virtual functions defined by CBaseDevice, and secondary developers create subclasses inheriting from CBaseDevice to implement these two interfaces. The framework defines the order of calls and error handling, so secondary developers do not need to concern themselves with these aspects.

3.4.2. Creational Patterns

Since frameworks typically involve the creation of various subclass objects, creational patterns are often used. For instance, in a drawing software framework, a base class defines the interface for graphic objects, and subclasses such as ellipses, rectangles, and lines can be derived from it. When a user draws a graphic, the framework must instantiate that subclass. Factory methods, prototype methods, etc., can be used here.

class CDrawObj { public: virtual int DrawObjTypeID()=0; virtual Icon GetToolBarIcon()=0; virtual void Draw(Rect rect)=0; virtual CDrawObj* Clone()=0; };

3.4.3. Message Subscription Pattern

The message subscription pattern is the most commonly used way to separate data from the interface. Interface developers only need to register the data they need, and when the data changes, the framework will push that data to the interface. A common issue with the message subscription pattern is how to handle reentrancy and timeouts in synchronous modes. As framework designers, it is essential to consider this issue. Reentrancy refers to secondary developers performing subscription/unsubscription operations within the message callback function, which can disrupt the message subscription mechanism. Timeouts refer to secondary developers’ message callback functions taking too long to process, preventing other messages from being responded to. The simplest way to handle this is to use asynchronous modes, allowing subscribers and data publishers to run in separate processes/threads. If this condition cannot be met, it must be a critical agreement of the framework to forbid secondary developers from causing such issues.

3.4.4. Decorator Pattern

The decorator pattern allows the framework to add functionality later. The framework defines an abstract base class for decorators, and specific implementers implement it, dynamically adding it to the framework.

For example, in a game, the graphics rendering engine is an independent module that can draw various images such as characters standing still or running. If the designers decide to add an item called “invisibility cloak” to the game, requiring players wearing this item to display a semi-transparent image, how should the graphics engine be designed to accommodate this upgrade?

When the invisibility cloak is equipped, a filter is added to the graphics engine. This is a highly simplified example; actual game engines would be more complex. The decorator pattern is also commonly used for pre- and post-processing of data.

3.5. Disadvantages of Frameworks

A good framework can significantly enhance product development efficiency and quality, but it also has its drawbacks.

  1. Frameworks are generally complex, and designing and implementing a good framework requires considerable time. Therefore, frameworks are only suitable when they can be applied repeatedly, as this ensures that the upfront investment will yield substantial returns.

  2. Frameworks stipulate a series of interfaces and rules, which, while simplifying secondary development work, also require secondary developers to remember many regulations. If these regulations are violated, the framework will not function correctly. However, because frameworks shield a lot of domain details, their learning costs are significantly reduced.

  3. Upgrading a framework can have severe impacts on existing products, requiring complete regression testing. There are two solutions to this issue. The first is to conduct strict testing on the framework itself, establishing a comprehensive unit testing library and developing sample projects to test all framework functionalities. The second is to use static linking, preventing existing products from easily following upgrades. Of course, if existing products have good regression testing methods, that is even better.

  4. Performance loss. Since frameworks abstract the system, they increase the complexity of the system. Techniques such as polymorphism generally reduce system performance. However, overall, frameworks can ensure that system performance remains at a relatively high level.

4. Automated Code Generation

4.1. Let Machines Do What They Can

Laziness is a virtue for programmers, and even more so for architects. The process of software development is about telling machines how to do things. If a task can be accomplished by machines, it should not be done by humans. Machines are not only tireless but also never make mistakes. Our job is to automate the client’s work, and with a bit of thought, we can also partially automate our work. Extremely patient programmers are good, but they can also be bad.

Well-designed systems often contain many highly similar and strongly patterned codes. Poorly designed systems may generate various implementations for the same type of functionality. The previous sections on framework design have already demonstrated this. Sometimes, we can take it a step further by analyzing the patterns in these similar codes and describing these functionalities using formatted data, allowing machines to generate code.

4.2. Examples

4.2.1. Encoding and Decoding Messages

In the framework example above, we can see that the message encoding and decoding part has been separated and decoupled from other parts. Given its characteristics, this part is well-suited for further “rule-based” processing, allowing machines to generate code.

Encoding is simply the process of streaming data structures; decoding is the reverse. For encoding, the code is essentially like this (binary protocol):

stream << a.i; stream << a.j; stream << a.object;

(To simplify, we assume that a stream object has already been designed to stream various data types and that issues such as byte order conversion have been handled.)

In the end, we obtain a stream. Have you become accustomed to writing such code? However, such code does not reflect any creativity from the engineer. Since we already know there are i, j, and an object, why do we still need to type this code ourselves? If we analyze the definition of a, can we automatically generate such code?

struct dataA { int i; int j; struct dataB object; };

With a simple semantic analyzer parsing this code, we can easily generate the code for streaming these data structures. Such an analyzer can be developed in about 200 lines of code using a language like Python, which has strong string processing capabilities. The tree representing data types would look something like this:

By traversing this tree, we can generate streaming code for all data structures.

In the previous framework example, the code generated for message encoding and decoding for one hardware module reached 30,000 lines, almost equivalent to a small software. Since it was automatically generated, there were no errors, providing high reliability for the upper layer.

We can also use XML or other formats to define data structures to generate code automatically. Depending on the need, any type can be generated, such as C++/Java/Python. If strong checks are desired, XSD can be used to define data structures. A commercial product, xBinder, is quite expensive and difficult to use, and might not be as good as developing it ourselves. (Why is it difficult to use? Because it is too general.) In addition to encoding to binary format, we can also generate code for other readable formats, such as XML. In this way, communication uses binary, while debugging uses XML, achieving the best of both worlds. The code for generating binary might look like this:

Xmlbuilder.addelement(“i”,a.i); Xmlbuilder.addelement(“j”,a.j); Xmlbuilder.addelement(“object”,a.object);

This approach is also well-suited for machine generation. The same idea can be applied to support embedded scripting in software. We won’t elaborate on this here. (The biggest issue with embedded scripting support is exchanging data between C/C++ and scripts, which leads to a lot of similar code regarding data types.)

Recently, Google released its protocol buffer, which exemplifies this approach.

4.2.2. GUI Code

In the framework design section above, we mentioned that the framework has no power over collecting interface data and updating the interface, so it can only abstract interfaces for programmers to implement. However, let’s take a look at what these interface programmers do. (The code has been simplified and can be seen as pseudo code).

void onDataArrive(CDataBinder& data) { m_biterror.setText(“%d”,data.biterror); m_signallevel.setText(“%d”,data.signallevel”); m_latency.setText(“%d”,data.latency”); } void onCollectData(CDataBinder& data) { data.biterror = atoi(m_biterror.getText()); data. signallevel = atoi(m_ signallevel.getText()); data. latency = atoi(m_ latency.getText()); }

Is this code interesting? What can we do about it? (Using XML to describe the interface, but it is challenging for complex logic)

4.2.3. Summary

From the above, it is evident that in the process of software architecture, we should first adhere to general principles, try to separate each functional part, achieve high cohesion and low coupling, and then discover the highly repetitive and strongly patterned code in the system, further formalizing and standardizing it, allowing machines to generate this code. Currently, the most successful application in this area is message encoding and decoding. Automating the generation of interface code has certain limitations but can still be applied. Everyone should be adept at discovering such possibilities in their work to reduce workload and improve work efficiency.

4.2.4. Google Protocol Buffer

Google’s recently released Protocol Buffer is a model of code generation automation.

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the

Leave a Comment