Source: EDN Electronic Technology Design
1. Entering Linux Memory


2. Linux Memory Address Space

-
User Mode: Code running in user mode is subject to many restrictions from the processor. -
Kernel Mode: The core mode in the processor’s storage protection. -
Three ways to switch from user mode to kernel mode: system calls, exceptions, and device interrupts. -
Differences: Each process has its own independent, non-interfering memory space; user-mode programs cannot arbitrarily operate on kernel address space, providing a certain level of security; kernel threads share the kernel address space.

-
MMU is a type of hardware circuit that contains two parts: a segmentation unit and a paging unit. -
The segmentation mechanism converts a logical address into a linear address. -
The paging mechanism converts a linear address into a physical address.

-
To facilitate fast retrieval of segment selectors, processors provide six segment registers to cache segment selectors: cs, ss, ds, es, fs, and gs. -
Segment Base Address: The starting address of the segment in the linear address space. -
Segment Limit: The maximum offset that can be used within the segment in the virtual address space.
-
The value in the segment register of the logical address provides the segment descriptor, which then retrieves the segment base and limit from the segment descriptor, and adding the offset of the logical address results in the linear address.
-
The paging mechanism is performed after the segmentation mechanism, which further converts the linear address into a physical address. -
10-bit page directory, 10-bit page table entry, 12-bit page offset. -
The size of a single page is 4KB.


-
TEXT: Executable code, string literals, read-only variables. -
DATA: Data segment, mapping already initialized global variables in the program. -
BSS Segment: Stores uninitialized global variables in the program. -
HEAP: Runtime heap, memory area allocated using malloc during program execution. -
MMAP: Memory mapping area for shared libraries and anonymous files. -
STACK: User process stack.

-
Direct Mapping Area: A linear space starting from 3G with a maximum interval of 896M, serving as the direct memory mapping area. -
Dynamic Memory Mapping Area: This area is allocated by the kernel function vmalloc. -
Permanent Memory Mapping Area: This area can access high-end memory. -
Fixed Mapping Area: This area has only a 4k isolation band from the top of 4G, with each address item serving a specific purpose, such as ACPI_BASE.
-
User processes usually can only access the virtual addresses of user space and cannot access the virtual addresses of kernel space. -
The kernel space is mapped by the kernel and does not change with the process; kernel space addresses have their own corresponding page tables, while user processes each have different page tables.

3. Linux Memory Allocation Algorithms
-
Causes: Memory allocation is small, and the lifespan of these small memory allocations is long, leading to memory fragmentation over repeated requests. -
Advantages: Increases allocation speed, facilitates memory management, and prevents memory leaks. -
Disadvantages: A large amount of memory fragmentation can slow down the system, lowering memory utilization and causing waste.
-
Avoid using dynamic memory allocation functions (try to use stack space). -
Allocate and free memory in the same function whenever possible. -
Try to allocate larger memory blocks at once, rather than repeatedly allocating small memory. -
Whenever possible, request large blocks of memory that are powers of two. -
External fragmentation avoidance – Buddy System Algorithm. -
Internal fragmentation avoidance – Slab Algorithm. -
Manage memory yourself by designing a memory pool.
-
Provides an efficient allocation strategy for the kernel to allocate a group of contiguous pages and effectively solves the external fragmentation problem. -
The allocated memory area is based on page frames.
-
External fragmentation refers to memory areas that have not yet been allocated (not belonging to any process) but are too small to be allocated to new processes requesting memory space. -
3) Organizational Structure -
All free pages are grouped into 11 block linked lists, each containing page blocks of sizes 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames. A maximum of 1024 contiguous pages can be requested, corresponding to a contiguous memory size of 4MB.

-
Request storage space for 2^i page blocks; if there are free page blocks in the corresponding block linked list, allocate to the application. -
If there are no free page blocks, check if there are free page blocks in the 2^(i-1) corresponding block linked list; if so, allocate a 2^i block linked list node to the application, and insert the other 2^i block linked list node back into the 2^i corresponding block linked list. -
If there are no free page blocks in the 2^(i-1) block linked list, repeat step 2 until a block linked list with free page blocks is found. -
If still none, return memory allocation failure.
-
Release storage space for 2^i page blocks, check the corresponding block linked list for free page blocks that are contiguous with its physical address; if none, no merging is needed. -
If there are, merge them into 2^(i-1) page blocks, and continue checking the next linked level until no further merging is possible.

-
Both blocks must be of the same size. -
Their physical addresses must be contiguous. -
Page block sizes must be the same.
-
The larger the allocated memory, the higher the likelihood of failure. -
Large memory usage scenarios are rare.
-
Modify MAX_ORDER and recompile the kernel. -
Pass the “mem=” parameter during kernel startup, such as “mem=80M”, reserving part of the memory; then through -
request_mem_region and ioremap_nocache map the reserved memory into the module. This method does not require recompiling the kernel but does not support x86 architecture, only ARM, PowerPC, and other non-x86 architectures. -
Call alloc_boot_mem to pre-allocate large memory before the mem_init function in start_kernel; this requires recompiling the kernel. -
vmalloc function, used by kernel code to allocate memory in virtual memory that is contiguous but not necessarily contiguous in physical memory.
-
These pages have fixed positions in memory, cannot be moved, and cannot be reclaimed. -
Kernel code segments, data segments, memory allocated by kernel kmalloc(), memory occupied by kernel threads, etc.
-
These pages cannot be moved but can be deleted. The kernel recycles pages when they occupy too much memory or when memory is scarce.
-
These pages can be moved freely; pages used by user-space applications fall into this category. They are mapped through page tables. -
When they are moved to a new location, the page table entries are also updated accordingly.
-
The slab allocator used in Linux is based on an algorithm first introduced by Jeff Bonwick for the SunOS operating system. -
Its basic idea is to place frequently used objects in the kernel into a cache maintained in an initial usable state. For example, process descriptors, which are frequently allocated and released in the kernel.
-
The allocated memory space is larger than the requested memory space.
-
Reduce internal fragmentation caused by the buddy algorithm when allocating small blocks of contiguous memory. -
Cache frequently used objects to reduce the time overhead of allocating, initializing, and releasing objects. -
Adjust objects using coloring techniques for better use of hardware caches.
-
Since objects are allocated and released from slabs, individual slabs can move between slab lists. -
Slabs in the slabs_empty list are the main candidates for recycling (reaping). -
Slab also supports the initialization of generic objects, avoiding repeated initialization of the same object.

-
The allocation of small blocks of contiguous memory provided by the slab allocator is implemented through a general cache. -
The objects provided by the general cache have geometrically distributed sizes, ranging from 32 to 131072 bytes. -
The kernel provides two interfaces, kmalloc() and kfree(), for memory allocation and release, respectively.
-
The kernel provides a complete interface for the allocation and release of dedicated caches, allocating slab caches for specific objects based on the parameters passed in. -
kmem_cache_create() is used to create a cache for a specified object. It allocates a cache descriptor for the new exclusive cache from the general cache cache_cache and inserts this descriptor into the cache_chain list formed by cache descriptors. -
kmem_cache_alloc() allocates a slab in the cache specified by its parameters. Conversely, kmem_cache_free() releases a slab in the cache specified by its parameters.
-
First, allocate a certain number of memory blocks of equal size (generally) as a reserve. -
When there is a new memory demand, a portion of memory blocks is taken from the memory pool; if there are not enough memory blocks, continue to allocate new memory. -
A significant advantage of this approach is that it minimizes memory fragmentation, improving memory allocation efficiency.
-
mempool_create creates a memory pool object. -
mempool_alloc allocates a function to obtain that object. -
mempool_free releases an object. -
mempool_destroy destroys the memory pool.


-
Direct Memory Access is a hardware mechanism that allows peripheral devices to directly transfer their I/O data to and from main memory without the involvement of the system processor. -
2) Functions of DMA Controller -
Can issue a system hold (HOLD) signal to the CPU, requesting bus takeover. -
When the CPU issues a permit takeover signal, it takes control of the bus, entering DMA mode. -
Can address memory and modify address pointers to perform read and write operations on memory. -
Can determine the number of bytes for this DMA transfer and judge whether the DMA transfer has ended. -
Issues a DMA end signal, allowing the CPU to resume normal operation.
-
DREQ: DMA request signal. It is the signal from a peripheral device to the DMA controller requesting a DMA operation. -
DACK: DMA acknowledge signal. It is the signal from the DMA controller to the requesting peripheral device indicating that the request has been received and is being processed. -
HRQ: The signal sent from the DMA controller to the CPU requesting bus takeover. -
HLDA: The signal sent from the CPU to the DMA controller, allowing the takeover of the bus.

4. Memory Usage Scenarios
-
Page management -
Slab (kmalloc, memory pool) -
User mode memory usage (malloc, realloc, file mapping, shared memory) -
Memory map of the program (stack, heap, code, data) -
Data transfer between kernel and user mode (copy_from_user, copy_to_user) -
Memory mapping (hardware registers, reserved memory) -
DMA memory
-
alloca allocates memory on the stack, so no need to release. -
Memory allocated by malloc is uninitialized, so using the malloc() function normally works at first (when the memory space has not been reallocated) but may cause issues after some time (when the memory space has been reallocated). -
calloc initializes every bit of the allocated memory space to zero. -
realloc expands the size of existing memory space.
-
When calling the malloc function, it searches the free_chuck_list linked list for a memory block large enough to satisfy the user’s request.

-
The main task of the free_chuck_list linked list is to maintain a linked list of free heap space buffers. -
If the buffer linked list does not find the corresponding node, it needs to extend the process’s stack space through the system call sys_brk.

-
Requests one or more physical pages through get_free_pages. -
Calculates the addr’s pte address in the process’s pdg mapping. -
Sets the pte corresponding to addr to the physical page’s starting address. -
System calls: Brk – request memory less than or equal to 128kb, do_map – request memory greater than 128kb.

-
User mode processes have exclusive virtual address spaces, and two processes can have the same virtual addresses. -
When accessing user mode virtual addresses, if there is no mapped physical address, a page fault exception is triggered through a system call. -
The page fault exception enters the kernel, allocates physical address space, and establishes a mapping with the user mode virtual address.

-
It allows multiple unrelated processes to access the same part of logical memory. -
Transferring data between two running processes, shared memory is a highly efficient solution. -
Sharing data between two running processes is an efficient method of inter-process communication, effectively reducing the number of data copies.

-
shmget creates shared memory. -
shmat starts access to the shared memory and connects it to the current process’s address space. -
shmdt detaches the shared memory from the current process.
5. Pitfalls of Memory Usage
-
Did not properly call new and delete functions in class constructors and destructors. -
Did not correctly clear nested object pointers. -
Did not define the base class destructor as a virtual function. -
When a base class pointer points to a subclass object, if the base class destructor is not virtual, the subclass destructor will not be called, leading to memory leaks as subclass resources are not correctly released. -
Missing copy constructor, passing by value will call (copy) the constructor, while passing by reference will not. -
Pointer arrays pointing to objects are not equivalent to object arrays; the array stores pointers to objects, requiring the release of each object’s space and each pointer’s space. -
Missing overloaded assignment operator, which also copies objects member by member, leading to memory leaks if the class size is variable.
-
Pointer variables are not initialized. -
Pointer is not set to NULL after being freed or deleted. -
Pointer operations exceed the variable’s scope, such as returning a pointer to stack memory. -
Accessing null pointers (need to check for null). -
sizeof cannot get the size of an array. -
Attempting to modify a constant, e.g.: char p=”1234″;p=’1′;
-
Multithreaded shared variables not marked with volatile. -
Multithreading accessing global variables without locks. -
Global variables are only effective for a single process. -
Multithreading writing shared memory data without synchronization. -
mmap memory mapping is not safe for multiple processes.
-
Deleted iterators become invalid. -
Adding elements (insert/push_back, etc.), deleting elements causes sequential container iterators to become invalid.


-
Replace auto_ptr with unique_ptr. -
Use make_shared to initialize a shared_ptr.

-
weak_ptr is a smart pointer assistant (1) Principle Analysis:
-
std::atomic atomic data type for multithreading safety. -
std::array fixed-length array has less overhead than array; unlike std::vector, the length of array is fixed and cannot be dynamically expanded. -
std::vector vector slimming shrink_to_fit(): reduces capacity to the same size as size(). -
td::forward_list
-
std::unordered_map, std::unordered_set are unordered containers implemented with hash, with time complexity of O(1) for insertion, deletion, and lookup; using unordered containers can achieve better performance when order of elements is not a concern.
-
Memory usage in the system: /proc/meminfo

-
Process memory usage: /proc/28040/status -
Check total memory usage: free

-
Check process CPU and memory usage ratio: top -
Virtual memory statistics: vmstat

-
Process memory consumption ratio and sorting: ps aux –sort -rss -
Release system memory cache: /proc/sys/vm/drop_caches
Long press to visit the public account included in the image to follow