Understanding Memory in Embedded Operating Systems

Follow andstar our public account to not miss out on exciting content.

Source: EDN Electronic Technology Design

Linux memory is a computer resource that backend developers need to understand deeply. Proper use of memory helps to improve machine performance and stability. This article mainly introduces the organization structure and page layout of Linux memory, the reasons for memory fragmentation and optimization algorithms, several memory management methods in the Linux kernel, memory usage scenarios, and the pitfalls of memory usage. From the principles and structure of memory to memory algorithm optimization and usage scenarios, we explore the mechanisms and mysteries of memory management.

1. Entering Linux Memory

1. What is memory?

1) Memory, also known as main memory, is the storage space directly addressable by the CPU, made of semiconductor devices.

2) The characteristic of memory is fast access speed.

Understanding Memory in Embedded Operating Systems

2. The role of memory

1) Temporarily stores the computation data of the CPU.

2) Data exchanged with external storage devices such as hard drives.

3) Ensures the stability and high performance of CPU calculations.

2. Linux Memory Address Space

1. Overview of Linux memory management.

2. Memory Address – User Mode & Kernel Mode

User Mode: Code running in user mode is subject to many restrictions from the processor.
Kernel Mode: The core mode in the processor’s storage protection.
Three ways to switch from user mode to kernel mode: system calls, exceptions, and device interrupts.
Differences: Each process has its own independent, non-interfering memory space; user-mode programs cannot arbitrarily operate on kernel address space, providing a certain level of security; kernel threads share the kernel address space.

3. Memory Address – MMU Address Translation

MMU is a type of hardware circuit that contains two parts: a segmentation unit and a paging unit.
The segmentation mechanism converts a logical address into a linear address.
The paging mechanism converts a linear address into a physical address.

4. Memory Address – Segmentation Mechanism

1) Segment Selector

To facilitate fast retrieval of segment selectors, processors provide six segment registers to cache segment selectors: cs, ss, ds, es, fs, and gs.
Segment Base Address: The starting address of the segment in the linear address space.
Segment Limit: The maximum offset that can be used within the segment in the virtual address space.

2) Segmentation Implementation

The value in the segment register of the logical address provides the segment descriptor, which then retrieves the segment base and limit from the segment descriptor, and adding the offset of the logical address results in the linear address.

5. Memory Address – Paging Mechanism (32-bit)

The paging mechanism is performed after the segmentation mechanism, which further converts the linear address into a physical address.
10-bit page directory, 10-bit page table entry, 12-bit page offset.
The size of a single page is 4KB.

6. User Mode Address Space

TEXT: Executable code, string literals, read-only variables.
DATA: Data segment, mapping already initialized global variables in the program.
BSS Segment: Stores uninitialized global variables in the program.
HEAP: Runtime heap, memory area allocated using malloc during program execution.
MMAP: Memory mapping area for shared libraries and anonymous files.
STACK: User process stack.

7. Kernel Mode Address Space

Direct Mapping Area: A linear space starting from 3G with a maximum interval of 896M, serving as the direct memory mapping area.
Dynamic Memory Mapping Area: This area is allocated by the kernel function vmalloc.
Permanent Memory Mapping Area: This area can access high-end memory.
Fixed Mapping Area: This area has only a 4k isolation band from the top of 4G, with each address item serving a specific purpose, such as ACPI_BASE.

8. Process Memory Space

User processes usually can only access the virtual addresses of user space and cannot access the virtual addresses of kernel space.
The kernel space is mapped by the kernel and does not change with the process; kernel space addresses have their own corresponding page tables, while user processes each have different page tables.

3. Linux Memory Allocation Algorithms

Memory management algorithms are a godsend for those who dislike managing memory themselves.

1. Memory Fragmentation

1) Basic Principles

Causes: Memory allocation is small, and the lifespan of these small memory allocations is long, leading to memory fragmentation over repeated requests.
Advantages: Increases allocation speed, facilitates memory management, and prevents memory leaks.
Disadvantages: A large amount of memory fragmentation can slow down the system, lowering memory utilization and causing waste.

2) How to Avoid Memory Fragmentation

Avoid using dynamic memory allocation functions (try to use stack space).
Allocate and free memory in the same function whenever possible.
Try to allocate larger memory blocks at once, rather than repeatedly allocating small memory.
Whenever possible, request large blocks of memory that are powers of two.
External fragmentation avoidance – Buddy System Algorithm.
Internal fragmentation avoidance – Slab Algorithm.
Manage memory yourself by designing a memory pool.

2. Buddy System Algorithm – Organizational Structure

1) Concept

Provides an efficient allocation strategy for the kernel to allocate a group of contiguous pages and effectively solves the external fragmentation problem.
The allocated memory area is based on page frames.

2) External Fragmentation

External fragmentation refers to memory areas that have not yet been allocated (not belonging to any process) but are too small to be allocated to new processes requesting memory space.
3) Organizational Structure
All free pages are grouped into 11 block linked lists, each containing page blocks of sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames. A maximum of 1024 contiguous pages can be requested, corresponding to a contiguous memory size of 4MB.

3. Buddy System Algorithm – Allocation and Reclamation

1) Allocation Algorithm

Request storage space for 2^i page blocks; if there are free page blocks in the corresponding block linked list, allocate to the application.
If there are no free page blocks, check if there are free page blocks in the 2^(i-1) corresponding block linked list; if so, allocate a 2^i block linked list node to the application, and insert the other 2^i block linked list node back into the 2^i corresponding block linked list.
If there are no free page blocks in the 2^(i-1) block linked list, repeat step 2 until a block linked list with free page blocks is found.
If still none, return memory allocation failure.

2) Reclamation Algorithm

Release storage space for 2^i page blocks, check the corresponding block linked list for free page blocks that are contiguous with its physical address; if none, no merging is needed.
If there are, merge them into 2^(i-1) page blocks, and continue checking the next linked level until no further merging is possible.

3) Conditions

Both blocks must be of the same size.
Their physical addresses must be contiguous.
Page block sizes must be the same.

4. How to allocate memory over 4M?

1) Why is there a limit on large memory allocations?

The larger the allocated memory, the higher the likelihood of failure.
Large memory usage scenarios are rare.

2) Methods to obtain large memory over 4M in the kernel.

Modify MAX_ORDER and recompile the kernel.
Pass the “mem=” parameter during kernel startup, such as “mem=80M”, reserving part of the memory; then through
request_mem_region and ioremap_nocache map the reserved memory into the module. This method does not require recompiling the kernel but does not support x86 architecture, only ARM, PowerPC, and other non-x86 architectures.
Call alloc_boot_mem to pre-allocate large memory before the mem_init function in start_kernel; this requires recompiling the kernel.
vmalloc function, used by kernel code to allocate memory in virtual memory that is contiguous but not necessarily contiguous in physical memory.

5. Buddy System – Anti-fragmentation Mechanism

1) Non-movable Pages

These pages have fixed positions in memory, cannot be moved, and cannot be reclaimed.
Kernel code segments, data segments, memory allocated by kernel kmalloc(), memory occupied by kernel threads, etc.

2) Reclaimable Pages

These pages cannot be moved but can be deleted. The kernel recycles pages when they occupy too much memory or when memory is scarce.

3) Movable Pages

These pages can be moved freely; pages used by user-space applications fall into this category. They are mapped through page tables.
When they are moved to a new location, the page table entries are also updated accordingly.

6. Slab Algorithm – Basic Principles

1) Basic Concept

The slab allocator used in Linux is based on an algorithm first introduced by Jeff Bonwick for the SunOS operating system.
Its basic idea is to place frequently used objects in the kernel into a cache maintained in an initial usable state. For example, process descriptors, which are frequently allocated and released in the kernel.

2) Internal Fragmentation

The allocated memory space is larger than the requested memory space.

3) Basic Objectives

Reduce internal fragmentation caused by the buddy algorithm when allocating small blocks of contiguous memory.
Cache frequently used objects to reduce the time overhead of allocating, initializing, and releasing objects.
Adjust objects using coloring techniques for better use of hardware caches.

7. Structure of Slab Allocator

Since objects are allocated and released from slabs, individual slabs can move between slab lists.
Slabs in the slabs_empty list are the main candidates for recycling (reaping).
Slab also supports the initialization of generic objects, avoiding repeated initialization of the same object.

8. Slab Cache

1) General Cache

The allocation of small blocks of contiguous memory provided by the slab allocator is implemented through a general cache.
The objects provided by the general cache have geometrically distributed sizes, ranging from 32 to 131072 bytes.
The kernel provides two interfaces, kmalloc() and kfree(), for memory allocation and release, respectively.

2) Dedicated Cache

The kernel provides a complete interface for the allocation and release of dedicated caches, allocating slab caches for specific objects based on the parameters passed in.
kmem_cache_create() is used to create a cache for a specified object. It allocates a cache descriptor for the new exclusive cache from the general cache cache_cache and inserts this descriptor into the cache_chain list formed by cache descriptors.
kmem_cache_alloc() allocates a slab in the cache specified by its parameters. Conversely, kmem_cache_free() releases a slab in the cache specified by its parameters.

9. Kernel Mode Memory Pool

1) Basic Principles

First, allocate a certain number of memory blocks of equal size (generally) as a reserve.
When there is a new memory demand, a portion of memory blocks is taken from the memory pool; if there are not enough memory blocks, continue to allocate new memory.
A significant advantage of this approach is that it minimizes memory fragmentation, improving memory allocation efficiency.

2) Kernel API

mempool_create creates a memory pool object.
mempool_alloc allocates a function to obtain that object.
mempool_free releases an object.
mempool_destroy destroys the memory pool.

10. User Mode Memory Pool

1) C++ Example

11. DMA Memory

1) What is DMA?

Direct Memory Access is a hardware mechanism that allows peripheral devices to directly transfer their I/O data to and from main memory without the involvement of the system processor.
2) Functions of DMA Controller
Can issue a system hold (HOLD) signal to the CPU, requesting bus takeover.
When the CPU issues a permit takeover signal, it takes control of the bus, entering DMA mode.
Can address memory and modify address pointers to perform read and write operations on memory.
Can determine the number of bytes for this DMA transfer and judge whether the DMA transfer has ended.
Issues a DMA end signal, allowing the CPU to resume normal operation.

2) DMA Signals

DREQ: DMA request signal. It is the signal from a peripheral device to the DMA controller requesting a DMA operation.
DACK: DMA acknowledge signal. It is the signal from the DMA controller to the requesting peripheral device indicating that the request has been received and is being processed.
HRQ: The signal sent from the DMA controller to the CPU requesting bus takeover.
HLDA: The signal sent from the CPU to the DMA controller, allowing the takeover of the bus.

4. Memory Usage Scenarios

Has the era of out of memory passed? No, even with ample memory, it should not be used recklessly.

1. Memory Usage Scenarios

Page management
Slab (kmalloc, memory pool)
User mode memory usage (malloc, realloc, file mapping, shared memory)
Memory map of the program (stack, heap, code, data)
Data transfer between kernel and user mode (copy_from_user, copy_to_user)
Memory mapping (hardware registers, reserved memory)
DMA memory

2. User Mode Memory Allocation Functions

alloca allocates memory on the stack, so no need to release.
Memory allocated by malloc is uninitialized, so using the malloc() function normally works at first (when the memory space has not been reallocated) but may cause issues after some time (when the memory space has been reallocated).
calloc initializes every bit of the allocated memory space to zero.
realloc expands the size of existing memory space.

a) If the current contiguous memory block is sufficient for realloc, it simply enlarges the space pointed to by p and returns the pointer address of p. At this point, q and p point to the same address.

b) If the current contiguous memory block is insufficient, it finds a new sufficiently long location, allocates a new memory block q, copies the contents pointed to by p to q, returns q, and deletes the memory space pointed to by p.

3. Kernel Mode Memory Allocation Functions

The principle of function allocation maximizes memory other_get_free_pages directly operates on page frames for larger amounts of contiguous physical memory kmem_cache_alloc based on slab mechanism for 128KB suitable for frequently allocating and releasing memory blocks of the same size kmalloc based on kmem_cache_alloc implementation for 128KB most common allocation method can be used when memory is less than page frame size vmalloc establishes a mapping of non-contiguous physical memory to virtual addresses, suitable for scenarios requiring large memory but not requiring address continuity dma_alloc_coherent based on _alloc_pages implementation for 4MB suitable for DMA operations ioremap implements the mapping of known physical addresses to virtual addresses suitable for cases where the physical address is known, such as device drivers alloc_bootmem reserves a portion of memory during kernel startup, which is invisible to the kernel and requires high memory management.

4. Memory Allocation with malloc

When calling the malloc function, it searches the free_chuck_list linked list for a memory block large enough to satisfy the user’s request.

The main task of the free_chuck_list linked list is to maintain a linked list of free heap space buffers.
If the buffer linked list does not find the corresponding node, it needs to extend the process’s stack space through the system call sys_brk.

5. Page Fault Exception

Requests one or more physical pages through get_free_pages.
Calculates the addr’s pte address in the process’s pdg mapping.
Sets the pte corresponding to addr to the physical page’s starting address.
System calls: Brk – request memory less than or equal to 128kb, do_map – request memory greater than 128kb.

6. User Process Memory Access Analysis

User mode processes have exclusive virtual address spaces, and two processes can have the same virtual addresses.
When accessing user mode virtual addresses, if there is no mapped physical address, a page fault exception is triggered through a system call.
The page fault exception enters the kernel, allocates physical address space, and establishes a mapping with the user mode virtual address.

7. Shared Memory

1) Principle

It allows multiple unrelated processes to access the same part of logical memory.
Transferring data between two running processes, shared memory is a highly efficient solution.
Sharing data between two running processes is an efficient method of inter-process communication, effectively reducing the number of data copies.

2) shm Interface

shmget creates shared memory.
shmat starts access to the shared memory and connects it to the current process’s address space.
shmdt detaches the shared memory from the current process.

5. Pitfalls of Memory Usage

1. C Memory Leak

Did not properly call new and delete functions in class constructors and destructors.
Did not correctly clear nested object pointers.
Did not define the base class destructor as a virtual function.
When a base class pointer points to a subclass object, if the base class destructor is not virtual, the subclass destructor will not be called, leading to memory leaks as subclass resources are not correctly released.
Missing copy constructor, passing by value will call (copy) the constructor, while passing by reference will not.
Pointer arrays pointing to objects are not equivalent to object arrays; the array stores pointers to objects, requiring the release of each object’s space and each pointer’s space.
Missing overloaded assignment operator, which also copies objects member by member, leading to memory leaks if the class size is variable.

2. C Wild Pointer

Pointer variables are not initialized.
Pointer is not set to NULL after being freed or deleted.
Pointer operations exceed the variable’s scope, such as returning a pointer to stack memory.
Accessing null pointers (need to check for null).
sizeof cannot get the size of an array.
Attempting to modify a constant, e.g.: char p=”1234″;p=’1′;

3. C Resource Access Conflict

Multithreaded shared variables not marked with volatile.
Multithreading accessing global variables without locks.
Global variables are only effective for a single process.
Multithreading writing shared memory data without synchronization.
mmap memory mapping is not safe for multiple processes.

4. STL Iterator Invalidity

Deleted iterators become invalid.
Adding elements (insert/push_back, etc.), deleting elements causes sequential container iterators to become invalid.

Incorrect Example: Deleting the current iterator will invalidate the iterator.

Correct Example: When erasing an iterator, save the next iterator.

5. C++11 Smart Pointer

Replace auto_ptr with unique_ptr.
Use make_shared to initialize a shared_ptr.

weak_ptr is a smart pointer assistant (1) Principle Analysis:

(2) Data Structure:

(3) Usage: a. lock() to get a strong reference pointer to the managed object b. expired() to check if the managed object has been released c. get() to access the smart pointer object

6. C++11 Smaller, Faster, Safer

std::atomic atomic data type for multithreading safety.
std::array fixed-length array has less overhead than array; unlike std::vector, the length of array is fixed and cannot be dynamically expanded.
std::vector vector slimming shrink_to_fit(): reduces capacity to the same size as size().
td::forward_list

forward_list is a singly linked list (std::list is a doubly linked list); in cases where only sequential traversal is needed, forward_list can save more memory, and its insertion and deletion performance is better than that of list.

std::unordered_map, std::unordered_set are unordered containers implemented with hash, with time complexity of O(1) for insertion, deletion, and lookup; using unordered containers can achieve better performance when order of elements is not a concern.

6. How to Check Memory

Memory usage in the system: /proc/meminfo

Process memory usage: /proc/28040/status
Check total memory usage: free

Check process CPU and memory usage ratio: top
Virtual memory statistics: vmstat

Process memory consumption ratio and sorting: ps aux –sort -rss
Release system memory cache: /proc/sys/vm/drop_caches

To free pagecache, use echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes, use echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries, and inodes, use echo 3 >/proc/sys/vm/drop_caches

Disclaimer:This article is sourced from the internet, and the copyright belongs to the original author. If there are copyright issues, please contact me for removal.

Recommended Reading:

Raspberry Pi based on Linux Windows XP

10 Tips for Embedded Software Testing

How to Reduce Bugs in Coding, Testing, and Debugging Stages?

Follow the WeChat public account “strongerHuang”, reply “1024” to see more content, reply “join group” to join the technical exchange group according to the rules.

Long press to visit the public account included in the image to follow

1. Entering Linux Memory

2. Linux Memory Address Space

3. Linux Memory Allocation Algorithms

4. Memory Usage Scenarios

5. Pitfalls of Memory Usage

Related posts

Leave a Comment Cancel reply