Data Structures and Algorithms in Embedded Systems

Embedded systems have their own data structures and algorithms, and there are quite a few.

This article provides a brief introduction to the data structures and algorithms used in embedded systems, mainly to summarize and describe the data structures and algorithms in the embedded field, not a detailed description of algorithm principles. Here, I do not specifically distinguish between data structures and algorithms; I describe them as a whole since data structures are often accompanied by related algorithms, and algorithms often require basic data structures as support.

Due to my work in the SOC chip product line serving consumer electronics products, and my inclination towards coding terminal products (non-mobile), they usually do not have a large amount of complex data to store, but are used as interfaces for collecting and transmitting information from the physical world, not involving large data storage, retrieval, and other operations on backend servers. Therefore, what I say may not be comprehensive and is only related to the data structures and algorithms involved in terminal product development, inevitably biased and one-sided.

This article does not specifically distinguish between data structures and algorithms, and the kernel specifically refers to the Linux kernel, while SOC refers to the ARM platform.

String Search Algorithms

There are many algorithms for string searching that we often encounter but do not need to deeply explore their implementations. The most common functions are strcmp/strncmp, strstr, find in C++, etc. The most common commands are grep, find, sed, etc. These involve string matching algorithms like KMP, BM, and Sunday, or their combinations, and these algorithms’ basic implementations do not require very special data structures; they can be solved with “ordinary” arrays.

We often use these algorithms unconsciously, but in the embedded field, we rarely focus on their internal implementations. This does not mean that we lack awareness; there are two reasons: first, it is unnecessary to focus on the implementation of such algorithms in embedded systems; second, we may not even know that such algorithms exist.

Moreover, during the compilation and execution of regular code, we may use some self-implemented string matching algorithms, such as parsing a string file in real-time for a program’s configuration file, which may involve some fscanf, strcmp, or self-implemented string comparison functions.

The most commonly used during compilation are some script commands like grep, as well as the make syntax functions built into Makefile. During execution, simple routines generally use self-implemented string matching (the most common being brute-force matching), while larger programs may use Lua to parse strings.

The data structures and algorithms most commonly encountered related to strings are brute-force matching, less aggressive RK algorithms, KMP series and its improvements and variations, while more advanced structures like trie trees can be seen in search engines, but they are rarely seen in the embedded field, almost nonexistent.

Queues

Queues are currently the most common data structure in user-mode programming. Due to the need for processing a large amount of video data in coding and decoding series products, video data transmission is essential, which satisfies the first-in-first-out characteristic, and most data transmission places in the entire code project meet this feature, commonly known as the “producer-consumer” model.

Thus, video data transmission and management in software, message processing queues, circular buffers, command processing queues, event dispatch modules, etc., all apply to this data structure, and the most commonly used organizational methods are self-implemented singly linked lists or the list doubly linked list in kernel data structures. The queue organizational method in the C++ template library is a type of queue structure.

There are different ways to implement a queue, and the usage of queues has certain limitations. For example, the queue I use most often may have one or more producers, but only one consumer. To implement multiple consumers, generally, multiple queue instances are created to reduce coding difficulty.

At this point, imagine how to implement multiple consumers in one instance; consumers can join in real-time, requiring only one reference count. Normally, consumers should have completely consistent production materials, meaning a production element can only be discarded after all consumers have used it, and we must consider the possibility of a consumer crashing, necessitating a timeout mechanism. If the reference count does not drop to zero after a certain time, the data must be forcibly discarded. Considering all this, unless memory is extremely tight, it is better to exchange more space for simpler coding, reducing errors, and ensuring ease of understanding and maintenance.

Common queue models, like circular buffers [Reference 1], are usually implemented using arrays, which is not difficult. However, if you want to write it in an extreme way, refer to the kfifo implementation in the kernel, which is a masterful work. Other types are more commonly implemented using the kernel’s list doubly linked list [Reference 2], characterized by being general-purpose. If you need to use it, you only need to embed the list_head structure into the structure you need to manage, and the doubly linked list also brings flexibility in data access. There are also self-implemented specific data type singly linked lists, but they are not commonly used.

Stacks

The stack structure’s characteristic is last-in-first-out, so in embedded systems, it is commonly used for parameter passing, including not only function parameter passing but also some custom parameter passing. For example, in the case of module decoupling, the Demeter principle is used to design parameter passing, where the push and pop processes may use the stack data structure; of course, using a queue is also possible.

Stacks are also used when recursive structures are needed, such as in the kernel’s V4L2 framework’s pipeline management flow initiation function, which uses breadth-first traversal of a graph. To avoid nested calls of the entire function, a stack data structure is used to save the pipeline nodes, fully conforming to the last-in-first-out characteristic.

Other places include function stack backtracking, such as the function call process backtracking during gdb debugging, stack storage during function execution, and some logging systems that use stack storage structures to track function calls, like Google’s open-source logging system glog. The kernel’s oops also has traces of stack storage structures.

Stack implementations are more commonly done with array structures; linked list structures can also be used, but since stacks are a special last-in-first-out structure, we do not need too much flexibility in deletion and addition, so to save access efficiency and moderate storage space, arrays are usually used. There are not many modules that need to implement stack structures.

Graphs and Bitmaps

Graph structures are commonly used to represent complex relationships with multiple links, such as social relationships, which belong to relatively complex graph types. In embedded programming, due to the lack of such complex interrelated structures, graphs do not generally appear in regular programming implementations.

However, graphs are used in the kernel, and the V4L2 pipeline initiation function I know of uses both graph traversal and bitmaps, as well as stacks. This function can be said to be quite impressive, and its name is media_entity_pipeline_start. Here, the graph structure is a directed acyclic graph used to represent the video input hardware modules managed under the entire V4L2 video input device framework, such as CSI, ISP, SENSOR, etc. Their data flow is directed and acyclic because the general direction of video data flow is always from hardware to drivers to user space, and there will be no cyclic operations in between.

Bitmaps are generally used for marking, indicating whether an object is valid or exists, etc. In short, the marked object has only two states (because bits can only take values of 0 or 1). In the aforementioned function, the bitmap is used to mark whether the endpoints of a graph have been accessed; if accessed, it is set to 1; otherwise, it is set to 0. Specific bitmap operations are described in the kernel’s bitmap.h and bitmap.c files, and relevant documentation on bitmaps in the kernel can also be directly reviewed.

Bitmaps are also used in buffer management. For example, in a recent video processing submodule I developed, I used an unsigned long long (ARM-32 64bit) length bitmap variable to store and mark whether the buffer is currently being used, which is used as a basis for buffer push and pop marking. Of course, a general bitmap structure should be customizable in length.

The characteristics of bitmap structures are compactness and efficiency; using just one bit can replace the function of a regular byte. It is evident that bitmaps must be implemented using array structures; linked lists are out of the question. Bitmaps can implement an ordered data marking management, but most of the time, they are used to implement an unordered data marking management, commonly using hash data structures to index from data to bitmaps, with upgraded versions like Bloom Filters being a type of bitmap application.

Sorting Algorithms

First, the most common purpose of sorting is for indexing (filtering) and searching, which constitutes almost all sorting needs. One is to easily find extreme values after sorting, such as sorting by time or amount on Taobao, which is to meet our subjective filtering needs. Another purpose of sorting is to obtain desired values faster, facilitating searches, such as in search engines, databases, and the kernel’s process tree.

It is important to note that sorting here is not limited to ascending or descending order; any arrangement that satisfies certain rules according to our design in a specific order is considered sorting. For example, the process tree maintained in the kernel for scheduling is implemented using a red-black tree, where the elements are process descriptors, and the leftmost child node is the most urgently scheduled process. Each time a new process is created or an existing process is scheduled, the kernel re-inserts it into the red-black tree according to the process’s weighted execution time, maintaining an ascending order. When retrieving, it extracts from the leftmost child node, and when deleting, it removes from the tree.

The above operations involve searching (insertion and deletion) and retrieval (extracting the leftmost child node). The red-black tree can reduce the time complexity of searching to O(logN). Other data structures and algorithms favorable for searching, insertion, and deletion include hash tables, skip lists, bucket sort, quick sort, etc. The kernel’s V4L2 control module [Reference 3] employs a concept similar to bucket sorting.

The core of coding and decoding products is video encoding and decoding and related processing, such as ISP effects, etc. Due to memory limitations, it is impossible to cache too many frames, so this part does not require sorting. In fact, in most scenarios, hash tables can be completely replaced by doubly linked lists, which are fully sufficient in terms of efficiency.

In summary, sorting algorithms are mostly found in the kernel and in commonly used sqlite databases, but within the database, we do not need to implement a sorting function ourselves, and users may not often need sorting operations.

Non-Generic Algorithms

In embedded systems, certain algorithms are not very “generic”; by generic, I mean popular and widely applicable, such as the CFS scheduling algorithm in the kernel, which, although widely used in the Linux kernel, does not require many people to focus on or implement such an algorithm, so its application scope is not that broad, hence I define it as non-generic.

In embedded coding and decoding products, “synchronization” is a very large category, involving synchronization during audio and video playback, synchronization during audio and video capture, and the problems with synchronization are troublesome for two reasons:

The time interval between the physically captured audio and video and their timestamp is not very certain.
The correspondence between video and audio is not one-to-one but one-to-many.
If synchronization fails for a certain frame during audio and video capture, it must be discarded or actively compensated, and the next capture cannot be blocked.

In fact, this audio and video capture involves a physical capture part that adds complexity compared to decoding, as physical capture is hard real-time; it does not wait for anyone. If you miss a frame, it is gone, and there is no room for regret. In contrast, decoding can tolerate some delays, but capture is uncompromising. Although audio and video involve complex hardware and overall processes, there is still a tolerance of 200ms. In addition to audio and video synchronization, there are other synchronizations, such as common online real-time anti-shake algorithms based on gyroscopes, which require synchronization at the level of tens or even milliseconds (gyroscope data and video frame data).

There are also memory allocation algorithms, such as slab, virtual memory page management algorithms, distortion correction, ISP effects, etc., some of which belong to file system management, while others belong to computer vision. Non-generic algorithms will not be a focus in my future learning and summarization process, as I may not have the energy to study these things extensively, focusing more on the more generic data structures and algorithms I have summarized.

End

This article provides a simple overview of the data structures and algorithms involved in embedded systems, which inevitably has its limitations, but I believe it is not a big deal. In the future, I will focus more on algorithms that are not limited to the embedded field, including some more generic algorithms and ideas in computer science, broadening my thinking and boundaries. The next article, if all goes well, should start with queues; there will likely be more than one article to describe queues, as they have widespread applications in the embedded field, and I may implement some practical tool modules to grasp the essence of queue applications.

I plan to summarize my learning by combining some practical code, which I have already started doing. If you have read this far, I have set up a repository on GitHub called Newbie-C, which is written in C. It currently has only a general framework, with content limited to the implementation of a circular buffer module, and is still being gradually improved, so I will not post the link here.

If anyone is interested in checking it out, you are welcome to submit some of your own code, but I will still try to improve it myself. I also need to improve the compilation module; otherwise, it is quite troublesome to compile at present, as it requires a global compilation each time. Below are some reference articles for this article, which I actually wrote myself, so go ahead and take a look (some have not yet been published on WeChat; you can see them by clicking the same position as the original text). It is time to start a new chapter.

[Reference 1] For circular buffer, you can refer to this article: Circular Buffer [Reference 2] For list doubly linked list, you can refer to this article: C Language List Head Doubly Linked List [Reference 3] Similar bucket sorting concepts can be referred to in this article: Data Structures of V4L2 Framework-Control

Go do what you want to do