Understanding Memory in Embedded Operating Systems

Follow andstar our public account to not miss out on exciting content.

Understanding Memory in Embedded Operating Systems

Source: EDN Electronic Technology Design

Linux memory is a computer resource that backend developers need to understand deeply. Proper use of memory helps to improve machine performance and stability. This article mainly introduces the organization structure and page layout of Linux memory, the reasons for memory fragmentation and optimization algorithms, several memory management methods in the Linux kernel, memory usage scenarios, and the pitfalls of memory usage. From the principles and structure of memory to memory algorithm optimization and usage scenarios, we explore the mechanisms and mysteries of memory management.

1. Entering Linux Memory

1. What is memory?
1) Memory, also known as main memory, is the storage space directly addressable by the CPU, made of semiconductor devices.
2) The characteristic of memory is fast access speed.
Understanding Memory in Embedded Operating Systems
2. The role of memory
1) Temporarily stores the computation data of the CPU.
2) Data exchanged with external storage devices such as hard drives.
3) Ensures the stability and high performance of CPU calculations.
Understanding Memory in Embedded Operating Systems

2. Linux Memory Address Space

1. Overview of Linux memory management.
Understanding Memory in Embedded Operating Systems
2. Memory Address – User Mode & Kernel Mode
  • User Mode: Code running in user mode is subject to many restrictions from the processor.
  • Kernel Mode: The core mode in the processor’s storage protection.
  • Three ways to switch from user mode to kernel mode: system calls, exceptions, and device interrupts.
  • Differences: Each process has its own independent, non-interfering memory space; user-mode programs cannot arbitrarily operate on kernel address space, providing a certain level of security; kernel threads share the kernel address space.
Understanding Memory in Embedded Operating Systems
3. Memory Address – MMU Address Translation
  • MMU is a type of hardware circuit that contains two parts: a segmentation unit and a paging unit.
  • The segmentation mechanism converts a logical address into a linear address.
  • The paging mechanism converts a linear address into a physical address.
Understanding Memory in Embedded Operating Systems
4. Memory Address – Segmentation Mechanism
1) Segment Selector
  • To facilitate fast retrieval of segment selectors, processors provide six segment registers to cache segment selectors: cs, ss, ds, es, fs, and gs.
  • Segment Base Address: The starting address of the segment in the linear address space.
  • Segment Limit: The maximum offset that can be used within the segment in the virtual address space.
2) Segmentation Implementation
  • The value in the segment register of the logical address provides the segment descriptor, which then retrieves the segment base and limit from the segment descriptor, and adding the offset of the logical address results in the linear address.
    Understanding Memory in Embedded Operating Systems
5. Memory Address – Paging Mechanism (32-bit)
  • The paging mechanism is performed after the segmentation mechanism, which further converts the linear address into a physical address.
  • 10-bit page directory, 10-bit page table entry, 12-bit page offset.
  • The size of a single page is 4KB.
Understanding Memory in Embedded Operating Systems
6. User Mode Address Space
Understanding Memory in Embedded Operating Systems
  • TEXT: Executable code, string literals, read-only variables.
  • DATA: Data segment, mapping already initialized global variables in the program.
  • BSS Segment: Stores uninitialized global variables in the program.
  • HEAP: Runtime heap, memory area allocated using malloc during program execution.
  • MMAP: Memory mapping area for shared libraries and anonymous files.
  • STACK: User process stack.
7. Kernel Mode Address Space
Understanding Memory in Embedded Operating Systems
  • Direct Mapping Area: A linear space starting from 3G with a maximum interval of 896M, serving as the direct memory mapping area.
  • Dynamic Memory Mapping Area: This area is allocated by the kernel function vmalloc.
  • Permanent Memory Mapping Area: This area can access high-end memory.
  • Fixed Mapping Area: This area has only a 4k isolation band from the top of 4G, with each address item serving a specific purpose, such as ACPI_BASE.
8. Process Memory Space
  • User processes usually can only access the virtual addresses of user space and cannot access the virtual addresses of kernel space.
  • The kernel space is mapped by the kernel and does not change with the process; kernel space addresses have their own corresponding page tables, while user processes each have different page tables.
Understanding Memory in Embedded Operating Systems

3. Linux Memory Allocation Algorithms

Memory management algorithms are a godsend for those who dislike managing memory themselves.
1. Memory Fragmentation
1) Basic Principles
  • Causes: Memory allocation is small, and the lifespan of these small memory allocations is long, leading to memory fragmentation over repeated requests.
  • Advantages: Increases allocation speed, facilitates memory management, and prevents memory leaks.
  • Disadvantages: A large amount of memory fragmentation can slow down the system, lowering memory utilization and causing waste.
2) How to Avoid Memory Fragmentation
  • Avoid using dynamic memory allocation functions (try to use stack space).
  • Allocate and free memory in the same function whenever possible.
  • Try to allocate larger memory blocks at once, rather than repeatedly allocating small memory.
  • Whenever possible, request large blocks of memory that are powers of two.
  • External fragmentation avoidance – Buddy System Algorithm.
  • Internal fragmentation avoidance – Slab Algorithm.
  • Manage memory yourself by designing a memory pool.
2. Buddy System Algorithm – Organizational Structure
1) Concept
  • Provides an efficient allocation strategy for the kernel to allocate a group of contiguous pages and effectively solves the external fragmentation problem.
  • The allocated memory area is based on page frames.
2) External Fragmentation
  • External fragmentation refers to memory areas that have not yet been allocated (not belonging to any process) but are too small to be allocated to new processes requesting memory space.
  • 3) Organizational Structure
  • All free pages are grouped into 11 block linked lists, each containing page blocks of sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames. A maximum of 1024 contiguous pages can be requested, corresponding to a contiguous memory size of 4MB.
Understanding Memory in Embedded Operating Systems
3. Buddy System Algorithm – Allocation and Reclamation
1) Allocation Algorithm
  • Request storage space for 2^i page blocks; if there are free page blocks in the corresponding block linked list, allocate to the application.
  • If there are no free page blocks, check if there are free page blocks in the 2^(i-1) corresponding block linked list; if so, allocate a 2^i block linked list node to the application, and insert the other 2^i block linked list node back into the 2^i corresponding block linked list.
  • If there are no free page blocks in the 2^(i-1) block linked list, repeat step 2 until a block linked list with free page blocks is found.
  • If still none, return memory allocation failure.
2) Reclamation Algorithm
  • Release storage space for 2^i page blocks, check the corresponding block linked list for free page blocks that are contiguous with its physical address; if none, no merging is needed.
    Understanding Memory in Embedded Operating Systems
  • If there are, merge them into 2^(i-1) page blocks, and continue checking the next linked level until no further merging is possible.
Understanding Memory in Embedded Operating Systems
3) Conditions
  • Both blocks must be of the same size.
  • Their physical addresses must be contiguous.
  • Page block sizes must be the same.
4. How to allocate memory over 4M?
1) Why is there a limit on large memory allocations?
  • The larger the allocated memory, the higher the likelihood of failure.
  • Large memory usage scenarios are rare.
2) Methods to obtain large memory over 4M in the kernel.
  • Modify MAX_ORDER and recompile the kernel.
  • Pass the “mem=” parameter during kernel startup, such as “mem=80M”, reserving part of the memory; then through
  • request_mem_region and ioremap_nocache map the reserved memory into the module. This method does not require recompiling the kernel but does not support x86 architecture, only ARM, PowerPC, and other non-x86 architectures.
  • Call alloc_boot_mem to pre-allocate large memory before the mem_init function in start_kernel; this requires recompiling the kernel.
  • vmalloc function, used by kernel code to allocate memory in virtual memory that is contiguous but not necessarily contiguous in physical memory.
5. Buddy System – Anti-fragmentation Mechanism
1) Non-movable Pages
  • These pages have fixed positions in memory, cannot be moved, and cannot be reclaimed.
  • Kernel code segments, data segments, memory allocated by kernel kmalloc(), memory occupied by kernel threads, etc.
2) Reclaimable Pages
  • These pages cannot be moved but can be deleted. The kernel recycles pages when they occupy too much memory or when memory is scarce.
3) Movable Pages
  • These pages can be moved freely; pages used by user-space applications fall into this category. They are mapped through page tables.
  • When they are moved to a new location, the page table entries are also updated accordingly.
6. Slab Algorithm – Basic Principles
1) Basic Concept
  • The slab allocator used in Linux is based on an algorithm first introduced by Jeff Bonwick for the SunOS operating system.
  • Its basic idea is to place frequently used objects in the kernel into a cache maintained in an initial usable state. For example, process descriptors, which are frequently allocated and released in the kernel.
2) Internal Fragmentation
  • The allocated memory space is larger than the requested memory space.
3) Basic Objectives
  • Reduce internal fragmentation caused by the buddy algorithm when allocating small blocks of contiguous memory.
  • Cache frequently used objects to reduce the time overhead of allocating, initializing, and releasing objects.
  • Adjust objects using coloring techniques for better use of hardware caches.
7. Structure of Slab Allocator
  • Since objects are allocated and released from slabs, individual slabs can move between slab lists.
  • Slabs in the slabs_empty list are the main candidates for recycling (reaping).
  • Slab also supports the initialization of generic objects, avoiding repeated initialization of the same object.
Understanding Memory in Embedded Operating Systems
8. Slab Cache
1) General Cache
  • The allocation of small blocks of contiguous memory provided by the slab allocator is implemented through a general cache.
  • The objects provided by the general cache have geometrically distributed sizes, ranging from 32 to 131072 bytes.
  • The kernel provides two interfaces, kmalloc() and kfree(), for memory allocation and release, respectively.
2) Dedicated Cache
  • The kernel provides a complete interface for the allocation and release of dedicated caches, allocating slab caches for specific objects based on the parameters passed in.
  • kmem_cache_create() is used to create a cache for a specified object. It allocates a cache descriptor for the new exclusive cache from the general cache cache_cache and inserts this descriptor into the cache_chain list formed by cache descriptors.
  • kmem_cache_alloc() allocates a slab in the cache specified by its parameters. Conversely, kmem_cache_free() releases a slab in the cache specified by its parameters.
9. Kernel Mode Memory Pool
1) Basic Principles
  • First, allocate a certain number of memory blocks of equal size (generally) as a reserve.
  • When there is a new memory demand, a portion of memory blocks is taken from the memory pool; if there are not enough memory blocks, continue to allocate new memory.
  • A significant advantage of this approach is that it minimizes memory fragmentation, improving memory allocation efficiency.
2) Kernel API
  • mempool_create creates a memory pool object.
  • mempool_alloc allocates a function to obtain that object.
  • mempool_free releases an object.
  • mempool_destroy destroys the memory pool.
Understanding Memory in Embedded Operating Systems
10. User Mode Memory Pool
1) C++ Example
Understanding Memory in Embedded Operating Systems
11. DMA Memory
1) What is DMA?
  • Direct Memory Access is a hardware mechanism that allows peripheral devices to directly transfer their I/O data to and from main memory without the involvement of the system processor.
  • 2) Functions of DMA Controller
  • Can issue a system hold (HOLD) signal to the CPU, requesting bus takeover.
  • When the CPU issues a permit takeover signal, it takes control of the bus, entering DMA mode.
  • Can address memory and modify address pointers to perform read and write operations on memory.
  • Can determine the number of bytes for this DMA transfer and judge whether the DMA transfer has ended.
  • Issues a DMA end signal, allowing the CPU to resume normal operation.
2) DMA Signals
  • DREQ: DMA request signal. It is the signal from a peripheral device to the DMA controller requesting a DMA operation.
  • DACK: DMA acknowledge signal. It is the signal from the DMA controller to the requesting peripheral device indicating that the request has been received and is being processed.
  • HRQ: The signal sent from the DMA controller to the CPU requesting bus takeover.
  • HLDA: The signal sent from the CPU to the DMA controller, allowing the takeover of the bus.
Understanding Memory in Embedded Operating Systems

4. Memory Usage Scenarios

Has the era of out of memory passed? No, even with ample memory, it should not be used recklessly.
1. Memory Usage Scenarios
  • Page management
  • Slab (kmalloc, memory pool)
  • User mode memory usage (malloc, realloc, file mapping, shared memory)
  • Memory map of the program (stack, heap, code, data)
  • Data transfer between kernel and user mode (copy_from_user, copy_to_user)
  • Memory mapping (hardware registers, reserved memory)
  • DMA memory
2. User Mode Memory Allocation Functions
  • alloca allocates memory on the stack, so no need to release.
  • Memory allocated by malloc is uninitialized, so using the malloc() function normally works at first (when the memory space has not been reallocated) but may cause issues after some time (when the memory space has been reallocated).
  • calloc initializes every bit of the allocated memory space to zero.
  • realloc expands the size of existing memory space.
a) If the current contiguous memory block is sufficient for realloc, it simply enlarges the space pointed to by p and returns the pointer address of p. At this point, q and p point to the same address.
b) If the current contiguous memory block is insufficient, it finds a new sufficiently long location, allocates a new memory block q, copies the contents pointed to by p to q, returns q, and deletes the memory space pointed to by p.
3. Kernel Mode Memory Allocation Functions
The principle of function allocation maximizes memory other_get_free_pages directly operates on page frames for larger amounts of contiguous physical memory kmem_cache_alloc based on slab mechanism for 128KB suitable for frequently allocating and releasing memory blocks of the same size kmalloc based on kmem_cache_alloc implementation for 128KB most common allocation method can be used when memory is less than page frame size vmalloc establishes a mapping of non-contiguous physical memory to virtual addresses, suitable for scenarios requiring large memory but not requiring address continuity dma_alloc_coherent based on _alloc_pages implementation for 4MB suitable for DMA operations ioremap implements the mapping of known physical addresses to virtual addresses suitable for cases where the physical address is known, such as device drivers alloc_bootmem reserves a portion of memory during kernel startup, which is invisible to the kernel and requires high memory management.
4. Memory Allocation with malloc
  • When calling the malloc function, it searches the free_chuck_list linked list for a memory block large enough to satisfy the user’s request.
Understanding Memory in Embedded Operating Systems
  • The main task of the free_chuck_list linked list is to maintain a linked list of free heap space buffers.
  • If the buffer linked list does not find the corresponding node, it needs to extend the process’s stack space through the system call sys_brk.
Understanding Memory in Embedded Operating Systems
5. Page Fault Exception
  • Requests one or more physical pages through get_free_pages.
  • Calculates the addr’s pte address in the process’s pdg mapping.
  • Sets the pte corresponding to addr to the physical page’s starting address.
  • System calls: Brk – request memory less than or equal to 128kb, do_map – request memory greater than 128kb.
Understanding Memory in Embedded Operating Systems
6. User Process Memory Access Analysis
  • User mode processes have exclusive virtual address spaces, and two processes can have the same virtual addresses.
  • When accessing user mode virtual addresses, if there is no mapped physical address, a page fault exception is triggered through a system call.
  • The page fault exception enters the kernel, allocates physical address space, and establishes a mapping with the user mode virtual address.
Understanding Memory in Embedded Operating Systems
7. Shared Memory
1) Principle
  • It allows multiple unrelated processes to access the same part of logical memory.
  • Transferring data between two running processes, shared memory is a highly efficient solution.
  • Sharing data between two running processes is an efficient method of inter-process communication, effectively reducing the number of data copies.
Understanding Memory in Embedded Operating Systems
2) shm Interface
  • shmget creates shared memory.
  • shmat starts access to the shared memory and connects it to the current process’s address space.
  • shmdt detaches the shared memory from the current process.

5. Pitfalls of Memory Usage

1. C Memory Leak
  • Did not properly call new and delete functions in class constructors and destructors.
    Understanding Memory in Embedded Operating Systems
  • Did not correctly clear nested object pointers.
  • Did not define the base class destructor as a virtual function.
  • When a base class pointer points to a subclass object, if the base class destructor is not virtual, the subclass destructor will not be called, leading to memory leaks as subclass resources are not correctly released.
  • Missing copy constructor, passing by value will call (copy) the constructor, while passing by reference will not.
  • Pointer arrays pointing to objects are not equivalent to object arrays; the array stores pointers to objects, requiring the release of each object’s space and each pointer’s space.
  • Missing overloaded assignment operator, which also copies objects member by member, leading to memory leaks if the class size is variable.
2. C Wild Pointer
  • Pointer variables are not initialized.
  • Pointer is not set to NULL after being freed or deleted.
  • Pointer operations exceed the variable’s scope, such as returning a pointer to stack memory.
  • Accessing null pointers (need to check for null).
  • sizeof cannot get the size of an array.
  • Attempting to modify a constant, e.g.: char p=”1234″;p=’1′;
3. C Resource Access Conflict
  • Multithreaded shared variables not marked with volatile.
  • Multithreading accessing global variables without locks.
  • Global variables are only effective for a single process.
  • Multithreading writing shared memory data without synchronization.
  • mmap memory mapping is not safe for multiple processes.
4. STL Iterator Invalidity
  • Deleted iterators become invalid.
  • Adding elements (insert/push_back, etc.), deleting elements causes sequential container iterators to become invalid.
Incorrect Example: Deleting the current iterator will invalidate the iterator.
Understanding Memory in Embedded Operating Systems
Correct Example: When erasing an iterator, save the next iterator.
Understanding Memory in Embedded Operating Systems
5. C++11 Smart Pointer
  • Replace auto_ptr with unique_ptr.
    Understanding Memory in Embedded Operating Systems
  • Use make_shared to initialize a shared_ptr.
Understanding Memory in Embedded Operating Systems
  • weak_ptr is a smart pointer assistant (1) Principle Analysis:
  • (2) Data Structure:
    (3) Usage: a. lock() to get a strong reference pointer to the managed object b. expired() to check if the managed object has been released c. get() to access the smart pointer object
6. C++11 Smaller, Faster, Safer
  • std::atomic atomic data type for multithreading safety.
  • std::array fixed-length array has less overhead than array; unlike std::vector, the length of array is fixed and cannot be dynamically expanded.
  • std::vector vector slimming shrink_to_fit(): reduces capacity to the same size as size().
  • td::forward_list
forward_list is a singly linked list (std::list is a doubly linked list); in cases where only sequential traversal is needed, forward_list can save more memory, and its insertion and deletion performance is better than that of list.
  • std::unordered_map, std::unordered_set are unordered containers implemented with hash, with time complexity of O(1) for insertion, deletion, and lookup; using unordered containers can achieve better performance when order of elements is not a concern.
6. How to Check Memory
  • Memory usage in the system: /proc/meminfo
Understanding Memory in Embedded Operating Systems
  • Process memory usage: /proc/28040/status
  • Check total memory usage: free
Understanding Memory in Embedded Operating Systems
  • Check process CPU and memory usage ratio: top
    Understanding Memory in Embedded Operating Systems
  • Virtual memory statistics: vmstat
Understanding Memory in Embedded Operating Systems
  • Process memory consumption ratio and sorting: ps aux –sort -rss
  • Release system memory cache: /proc/sys/vm/drop_caches
To free pagecache, use echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes, use echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries, and inodes, use echo 3 >/proc/sys/vm/drop_caches
Disclaimer:This article is sourced from the internet, and the copyright belongs to the original author. If there are copyright issues, please contact me for removal.
Recommended Reading:
Raspberry Pi based on Linux Windows XP
10 Tips for Embedded Software Testing
How to Reduce Bugs in Coding, Testing, and Debugging Stages?
Follow the WeChat public account “strongerHuang”, reply “1024” to see more content, reply “join group” to join the technical exchange group according to the rules.

Understanding Memory in Embedded Operating Systems

Long press to visit the public account included in the image to follow

Leave a Comment