Understanding Embedded Operating System Memory Management

Understanding Embedded Operating System Memory Management

Article word count: 5000, Content value index: ☆ ☆ ☆ ☆ ☆

Key content:

☆ Linux memory organization structure and page layout, causes of memory fragmentation and optimization algorithms.

☆ Several memory management methods in the Linux kernel, memory usage scenarios, and pitfalls in memory usage.

☆ Explore the mechanisms and mysteries of memory management from the principles and structures of memory to algorithm optimization and usage scenarios.

Entering Linux memory

1. What is memory?

1) Memory, also known as main memory, is the storage space that the CPU can directly address, made up of semiconductor devices.

2) The characteristic of memory is its fast access speed.

Understanding Embedded Operating System Memory Management

2. The role of memory

1) Temporarily stores the calculation data of the CPU.

2) Data exchanged with external storage devices such as hard drives.

3) Ensures the stability and high performance of CPU calculations.

Understanding Embedded Operating System Memory Management

Linux memory address space

1. Overview of Linux memory address space and management

Understanding Embedded Operating System Memory Management

2. Memory address – User mode & Kernel mode

· User mode: Ring 3 code running in user mode is subject to many restrictions from the processor.

· Kernel mode: Ring 0 is the core state in the processor’s storage protection.

· Three ways to switch from user mode to kernel mode: system calls, exceptions, and peripheral interrupts.

· Difference: Each process has its own complete, independent, and non-interfering memory space; programs in user mode cannot arbitrarily manipulate kernel address space, providing a certain level of security; kernel mode threads share kernel address space.

Understanding Embedded Operating System Memory Management

3. Memory address – MMU address translation

· MMU is a type of hardware circuit that consists of two parts: a segmentation unit and a paging unit.

· The segmentation mechanism converts a logical address into a linear address.

· The paging mechanism converts a linear address into a physical address.

Understanding Embedded Operating System Memory Management

4. Memory address – Segmentation mechanism

1) Segment selector

· To facilitate quick retrieval of segment selectors, the processor provides six segment registers to cache segment selectors: cs, ss, ds, es, fs, and gs.

· Segment base address: The starting address of the segment in the linear address space.

· Segment limit: The maximum offset that can be used within the segment in the virtual address space.

2) Segmentation implementation

· The values in the segment register of the logical address provide the segment descriptor, which then gives the segment base address and limit, and by adding the offset of the logical address, the linear address is obtained.

Understanding Embedded Operating System Memory Management

5. Memory address – Paging mechanism (32-bit)

· The paging mechanism is performed after the segmentation mechanism, which further converts the linear address into a physical address.

· 10-bit page directory, 10-bit page table entry, 12-bit page offset address.

· The size of a single page is 4KB.

Understanding Embedded Operating System Memory Management

6. User mode address space

Understanding Embedded Operating System Memory Management

· TEXT: Code segment for executable code, string literals, read-only variables.

· DATA: Data segment, mapping already initialized global variables in the program.

· BSS segment: Stores uninitialized global variables in the program.

· HEAP: Runtime heap, memory area allocated using malloc during program execution.

· MMAP: Mapping area for shared libraries and anonymous files.

· STACK: User process stack.

7. Kernel mode address space

Understanding Embedded Operating System Memory Management

· Direct mapping area: The linear space starting from 3G, up to 896M, is the direct memory mapping area.

· Dynamic memory mapping area: This area is allocated by the kernel function vmalloc.

· Permanent memory mapping area: This area can access high-end memory.

· Fixed mapping area: This area has only a 4k isolation band from the top of 4G, with each address serving a specific purpose, such as ACPI_BASE, etc.

8. Process memory space

· User processes typically can only access virtual addresses in user space and cannot access virtual addresses in kernel space.

· Kernel space is mapped by the kernel and does not change with the process; kernel space addresses have their corresponding page tables, while user processes have different page tables.

Understanding Embedded Operating System Memory Management

Linux memory allocation algorithms

Memory management algorithms are a blessing for those who dislike managing memory themselves.

1. Memory fragmentation

1) Basic principle

· Cause: Memory allocation is small, and the small memory segments have a long lifespan, leading to the emergence of memory fragmentation after repeated allocations.

· Advantages: Increases allocation speed, facilitates memory management, and prevents memory leaks.

· Disadvantages: A large amount of memory fragmentation can slow down the system, lower memory usage efficiency, and cause significant waste.

2) How to avoid memory fragmentation

· Minimize the use of dynamic memory allocation functions (try to use stack space).

· Allocate and free memory within the same function as much as possible.

· Try to request larger memory blocks at once rather than repeatedly requesting small memory segments.

· Whenever possible, request large blocks of memory in powers of 2 size.

· Avoid external fragmentation – buddy system algorithm.

· Avoid internal fragmentation – slab algorithm.

· Conduct memory management work yourself by designing a memory pool.

2. Buddy system algorithm – Organizational structure

1) Concept

· Provides an efficient allocation strategy for the kernel to allocate a set of contiguous pages and effectively solves the external fragmentation problem.

· The allocated memory area is based on page frames.

2) External fragmentation

· External fragmentation refers to memory areas that have not been allocated (do not belong to any process) but are too small to be allocated to new processes requesting memory space.

3) Organizational structure.

· All free pages are grouped into 11 block linked lists, each containing page blocks of sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames. The maximum allocable size is for 1024 contiguous pages, corresponding to 4MB of contiguous memory.

Understanding Embedded Operating System Memory Management

3. Buddy system algorithm – Allocation and recovery

1) Allocation algorithm

· Request storage space for 2^i page blocks; if the block linked list corresponding to 2^i has free page blocks, allocate them to the application.

· If there are no free page blocks, check if the block linked list corresponding to 2^(i-1) has free page blocks; if yes, allocate 2^i block linked list nodes to the application, and insert the other 2^i block linked list nodes into the corresponding block linked list.

· If there are no free page blocks in the 2^(i-1) block linked list, repeat step 2 until a block linked list with free page blocks is found.

· If still not found, return memory allocation failure.

2) Recovery algorithm

· Release storage space for 2^i page blocks, check the corresponding block linked list for free page blocks with contiguous physical addresses; if none, no need to merge.

Understanding Embedded Operating System Memory Management

· If there are, merge them into a block of size 2^(i-1), and continue checking the next level block link until no further merges are possible.

Understanding Embedded Operating System Memory Management

3) Conditions

· The two blocks must be of the same size.

· Their physical addresses must be contiguous.

· The page block sizes must be the same.

4. How to allocate more than 4M memory?

1) Why is there a limit on large memory allocations?

· The larger the allocated memory, the greater the possibility of failure.

· There are fewer use cases for large memory blocks.

2) Methods for obtaining large memory over 4M in the kernel

· Modify MAX_ORDER and recompile the kernel.

· Pass the “mem=” parameter when starting the kernel, such as “mem=80M”, reserving a portion of memory; then through

· request_mem_region and ioremap_nocache to map the reserved memory into the module. This method does not support x86 architecture, only supporting ARM, PowerPC, and other non-x86 architectures.

· Call alloc_boot_mem before the mem_init function in start_kernel to preallocate large memory blocks, requiring recompilation of the kernel.

· Use the vmalloc function, which the kernel code uses to allocate memory that is contiguous in virtual memory but not necessarily contiguous in physical memory.

5. Buddy system – Anti-fragmentation mechanism

1) Non-movable pages

· These pages have fixed positions in memory and cannot be moved or reclaimed.

· Kernel code segment, data segment, memory allocated by kernel kmalloc(), memory occupied by kernel threads, etc.

2) Reclaimable pages

· These pages cannot be moved but can be deleted. The kernel recycles pages when they occupy too much memory or when there is a memory shortage.

3) Movable pages

· These pages can be moved arbitrarily, and pages used by user space applications belong to this category. They are mapped through page tables.

· When they move to a new location, the page table entries are updated accordingly.

6. Slab algorithm – Basic principles

1) Basic concept

· The slab allocator used by Linux is based on an algorithm first introduced by Jeff Bonwick for the SunOS operating system.

· Its basic idea is to place frequently used objects in the kernel into a cache and maintain them in an initial usable state. For example, process descriptors are frequently requested and released in the kernel.

2) Internal fragmentation

· The allocated memory space is larger than the requested memory space.

3) Basic goals

· Reduce internal fragmentation caused by the buddy algorithm when allocating small blocks of contiguous memory.

· Cache frequently used objects to reduce the time overhead of allocating, initializing, and releasing objects.

· Adjust objects using coloring techniques to better utilize hardware cache.

7. Structure of the slab allocator

· Since objects are allocated and released from slabs, individual slabs can move between slab lists.

· The slabs_empty list contains slabs that are primary candidates for recycling (reaping).

· Slabs also support the initialization of general objects, avoiding the need to repeatedly initialize an object for the same purpose.

Understanding Embedded Operating System Memory Management

8. Slab cache

1) Regular cache

· The allocation of small blocks of contiguous memory provided by the slab allocator is achieved through a general cache.

· The objects provided by the general cache have geometrically distributed sizes, ranging from 32 to 131072 bytes.

· The kernel provides two interfaces, kmalloc() and kfree(), for memory allocation and release, respectively.

2) Dedicated cache

· The kernel provides a complete set of interfaces for allocating and releasing dedicated caches, allocating slab caches for specific objects based on the parameters passed.

· kmem_cache_create() is used to create a cache for a specified object. It allocates a cache descriptor for the new dedicated cache from the cache_cache general cache and inserts this descriptor into the cache_chain list formed by cache descriptors.

· kmem_cache_alloc() allocates a slab in the cache specified by its parameters. Conversely, kmem_cache_free() releases a slab in the specified cache.

9. Kernel mode memory pool

1) Basic principle

· First, allocate a certain number of equal-sized (generally) memory blocks as a reserve.

· When new memory is needed, a portion of memory blocks is taken from the memory pool; if the memory blocks are insufficient, new memory is requested.

· A significant advantage of this approach is that it minimizes memory fragmentation, improving memory allocation efficiency.

2) Kernel API

· mempool_create creates a memory pool object.

· mempool_alloc allocates a function to obtain that object.

· mempool_free releases an object.

· mempool_destroy destroys the memory pool.

Understanding Embedded Operating System Memory Management

10. User mode memory pool

1) C++ example

Understanding Embedded Operating System Memory Management

11. DMA memory

1) What is DMA

· Direct Memory Access is a hardware mechanism that allows peripheral devices to transfer their I/O data directly to and from main memory without the involvement of the system processor.

2) Functions of the DMA controller.

· Can send a system hold (HOLD) signal to the CPU, requesting to take over the bus.

· After the CPU sends an allow takeover signal, it takes control of the bus and enters DMA mode.

· Can address memory and modify address pointers, performing read and write operations on memory.

· Can determine the number of bytes to be transferred in this DMA transfer and judge whether the DMA transfer has ended.

· Sends a DMA end signal, allowing the CPU to resume its normal working state.

2) DMA signals

· DREQ: DMA request signal. This is the signal from the peripheral to the DMA controller requesting DMA operation.

· DACK: DMA acknowledgment signal. This is the signal from the DMA controller to the peripheral that made the DMA request, indicating that the request has been received and is being processed.

· HRQ: The signal from the DMA controller to the CPU requesting bus takeover.

· HLDA: The signal from the CPU to the DMA controller, allowing bus takeover.

Understanding Embedded Operating System Memory Management

Memory usage scenarios

Has the era of out of memory passed? No, even with sufficient memory, it should not be used recklessly.

1. Memory usage scenarios

· Page management

· Slab (kmalloc, memory pool)

· User mode memory usage (malloc, realloc file mapping, shared memory)

· Memory map of the program (stack, heap, code, data)

· Data transfer between kernel and user mode (copy_from_user, copy_to_user)

· Memory mapping (hardware registers, reserved memory)

· DMA memory

2. User mode memory allocation functions

· alloca requests memory from the stack, so it does not need to be released.

· Memory allocated by malloc is uninitialized; programs using malloc() may run normally at first (when the memory space has not been reallocated) but may encounter issues after some time (when the memory space has been reallocated).

· calloc initializes every bit of the allocated memory space to zero.

· realloc extends the size of the existing memory space.

a) If the current contiguous memory block is sufficient for realloc, it simply expands the space pointed to by p and returns the pointer address of p. At this time, q and p point to the same address.

b) If the current contiguous memory block is insufficient, it finds a new location long enough, allocates a new memory block q, copies the content pointed to by p to q, returns q, and deletes the memory space pointed to by p.

3. Kernel mode memory allocation functions

The function allocation principle maximum memory other_get_free_pages directly operates on page frames 4MB suitable for allocating larger amounts of contiguous physical memory kmem_cache_alloc based on slab mechanism implementation 128KB suitable for frequently allocating and releasing memory blocks of the same size kmalloc based on kmem_cache_alloc implementation 128KB the most common allocation method, can use vmalloc to establish a mapping of non-contiguous physical memory to virtual addresses physical non-contiguous, suitable for large memory needs, but does not require address continuity dma_alloc_coherent implemented based on _alloc_pages suitable for DMA operations ioremap implements mapping from known physical addresses to virtual addresses applicable when physical addresses are known, such as in device drivers alloc_bootmem reserves a segment of memory during kernel startup, the kernel cannot see less than physical memory size, memory management requirements are high.

4. malloc memory allocation

· When calling the malloc function, it searches the free_chuck_list linked list for a memory block large enough to satisfy the user’s request.

Understanding Embedded Operating System Memory Management

· The main task of the free_chuck_list linked list is to maintain a buffer chain list of free heap space.

· If the buffer chain list does not find the corresponding node, it needs to extend the process’s stack space through the system call sys_brk.

Understanding Embedded Operating System Memory Management

5. Page fault exception

· Apply for one or more physical pages through get_free_pages.

· Calculate the addr in the process’s pdg mapping to the corresponding pte address.

· Set the pte corresponding to addr to the first address of the physical page.

· System calls: Brk – apply for memory less than or equal to 128kb, do_map – apply for memory greater than 128kb.

Understanding Embedded Operating System Memory Management

6. User process memory access analysis

· User mode processes exclusively occupy virtual address space; two processes can have the same virtual addresses.

· When accessing user mode virtual address space, if there is no mapped physical address, a page fault exception is issued through a system call.

· Page fault exceptions enter the kernel, allocate physical address space, and establish a mapping with the user mode virtual address.

Understanding Embedded Operating System Memory Management

7. Shared memory

1) Principle

· It allows multiple unrelated processes to access the same part of logical memory.

· Sharing memory between two running processes is an extremely efficient solution for data transfer.

· Sharing data between two running processes is an efficient method of inter-process communication, effectively reducing the number of data copies.

Understanding Embedded Operating System Memory Management

2) shm interface

· shmget creates shared memory.

· shmat starts access to the shared memory and connects the shared memory to the current process’s address space.

· shmdt detaches the shared memory from the current process.

Memory usage pitfalls

1. C Memory Leak

· Mismatched calls of new and delete in class constructors and destructors.

Understanding Embedded Operating System Memory Management

· Not properly clearing nested object pointers.

· Not defining the base class destructor as a virtual function.

· When the base class pointer points to a subclass object, if the base class destructor is not virtual, the subclass destructor will not be called, and the subclass resources will not be properly released, causing memory leaks.

· Missing copy constructor, value passing will call (copy) the constructor, reference passing will not call it.

· Pointer arrays pointing to objects are not equivalent to object arrays; the array stores pointers to objects, and both the object space and the pointer space must be released.

· Missing overloaded assignment operator, copying objects is done member by member; if the size of this class is variable, the result will be memory leaks.

2. C Wild Pointer

· Pointer variables not initialized.

· Pointers set to NULL after free or delete.

· Pointer operations exceed the variable’s scope, such as returning a pointer to stack memory is a wild pointer.

· Accessing a null pointer (null checks are necessary).

· sizeof cannot obtain the size of an array.

· Attempting to modify a constant, e.g., char p=”1234″; p=’1′;

3. C Resource Access Conflicts

· Multi-threaded shared variables not declared with volatile.

· Multi-threaded access to global variables without locks.

· Global variables are only effective for a single process.

· Multi-process writing of shared memory data without synchronization processing.

· mmap memory mapping, multi-process unsafe.

4. STL Iterator Invalidity

· Invalid iterators that have been deleted.

· Adding elements (insert/push_back, etc.) or deleting elements causes iterator invalidity in sequence containers.

· Incorrect example: Deleting the current iterator will invalidate the iterator.

Understanding Embedded Operating System Memory Management

Correct example: When erasing an iterator, save the next iterator.

Understanding Embedded Operating System Memory Management

5. C++11 Smart Pointer

· Replace auto_ptr with unique_ptr.

Understanding Embedded Operating System Memory Management

· Use make_shared to initialize a shared_ptr.

Understanding Embedded Operating System Memory Management

· weak_ptr smart pointer helper.

1) Principle analysis:

Understanding Embedded Operating System Memory Management

2) Data structure:

Understanding Embedded Operating System Memory Management

3) Usage: a. lock() gets a strong reference pointer to the managed object b. expired() checks whether the managed object has been released c. get() accesses the smart pointer object.

6. C++11 Smaller, Faster, Safer

· std::atomic for multi-thread safety.

· std::array smaller overhead than long arrays, unlike std::vector, the length of array is fixed and cannot be dynamically expanded.

· std::vector shrink_to_fit() reduces capacity to the same size as size().

· std::forward_list

forward_list is a singly linked list (std::list is a doubly linked list), suitable for cases where only sequential traversal is needed, forward_list can save memory better, and insertion and deletion performance is higher than list.

· std::unordered_map, std::unordered_set are unordered containers implemented with hash, with O(1) time complexity for insertion, deletion, and lookup, using unordered containers can achieve better performance when the order of elements in the container is not a concern.

How to check memory:

· System memory usage: /proc/meminfo.

Understanding Embedded Operating System Memory Management

· Process memory usage: /proc/28040/status.

· Query total memory usage: free.

Understanding Embedded Operating System Memory Management

· Query process CPU and memory usage ratio: top.

Understanding Embedded Operating System Memory Management

· Virtual memory statistics: vmstat.

Understanding Embedded Operating System Memory Management

· Process memory consumption ratio and sorting: ps aux –sort -rss.

Understanding Embedded Operating System Memory Management

· Release system memory cache: /proc/sys/vm/drop_caches.

Disclaimer: This article is sourced from the internet, and copyright belongs to the original author. If there are copyright issues, please contact for deletion.

Understanding Embedded Operating System Memory Management

Leave a Comment