In-Depth Understanding of Linux Memory Subsystem

Linux memory is a computer resource that backend developers need to understand deeply. Proper use of memory helps improve machine performance and stability. This article mainly introduces the organization structure and page layout of Linux memory, the causes of memory fragmentation and optimization algorithms, various memory management methods in the Linux kernel, memory usage scenarios, and the pitfalls of memory usage. From the principles and structure of memory to algorithm optimization and usage scenarios, we explore the mechanisms and mysteries of memory management.

>>>>

Original:https://cloud.tencent.com/developer/article/10056

Reorganized and updated: Geek Rebirth

Introduction
1. Understanding Linux Memory
2. Linux Memory Address Space
3. Linux Memory Allocation Algorithms
4. Memory Usage Scenarios
5. Pitfalls of Memory Usage

1. Understanding Linux Memory

1. What is Memory?

In-Depth Understanding of Linux Memory Subsystem

1) Memory, also known as main memory, is the storage space that the CPU can directly address and is made of semiconductor devices;

2) The characteristics of memory include fast access speed, data is generally not retained after power off, and it is a non-persistent device;

2. The Role of Memory

1) Temporarily stores the computation data of the CPU

2) Data exchanged with external storage like hard disks

3) Ensures the stability and high performance of CPU computations

2. Linux Memory Address Space

1. Overview of Linux Memory Management

2. Memory Address – User Mode & Kernel Mode

User Mode: Ring 3 code running in user mode is subject to many restrictions from the processor
Kernel Mode: Ring 0 is the core state in the processor’s storage protection
Three ways to switch from user mode to kernel mode: system calls, exceptions, and external device interrupts
Difference: Each process has its own independent memory space that is not disturbed; programs in user mode cannot arbitrarily manipulate kernel address space, which provides a certain level of security; kernel mode threads share the kernel address space;

3. Memory Address – MMU Address Translation

MMU is a hardware circuit that includes two components: a segmentation component and a paging component
The segmentation mechanism converts a logical address into a linear address
The paging mechanism converts a linear address into a physical address

4. Memory Address – Segmentation Mechanism

1) Segment Selector

To facilitate quick retrieval of segment selectors, the processor provides six segment registers to cache segment selectors, which are: cs, ss, ds, es, fs, and gs
Segment Base Address: The starting address of the segment in the linear address space
Segment Limit: The maximum offset that can be used within the segment in the virtual address space

2) Segmentation Implementation

The value in the segment register of the logical address provides the segment descriptor, from which the segment base and limit are obtained, and by adding the offset of the logical address, the linear address is obtained

5. Memory Address – Paging Mechanism (32-bit)

The paging mechanism occurs after the segmentation mechanism, further converting the linear address into a physical address
10-bit page directory, 10-bit page table entry, and 12-bit page offset
The size of a single page is 4KB

6. User Mode Address Space

text: Code segment executable code, string literals, read-only variables
data: Data segment, maps initialized global variables in the program
bss: Stores uninitialized global variables in the program
heap: Runtime heap, the memory area allocated using malloc during program execution
mmap: Mapping area for shared libraries and anonymous files
stack: User process stack

7. Kernel Mode Address Space

Direct Mapping Area: The area from 3G starting at a maximum of 896M is for direct memory mapping
Dynamic Memory Mapping Area: This area is allocated by the kernel function vmalloc
Permanent Memory Mapping Area: This area can access high memory
Fixed Mapping Area: This area has only a 4k isolation band from the top of 4G, with each address item serving a specific purpose, such as: ACPI_BASE, etc.

8. Process Memory Space

User processes can typically only access the virtual addresses of user space and cannot access the virtual addresses of kernel space
Kernel space is mapped by the kernel and does not change with the process; kernel space addresses have their corresponding page tables, while user processes each have different page tables

3. Linux Memory Allocation Algorithms

Memory Management Algorithms – A gift for those who dislike managing memory themselves

Linux Memory Management Framework

1. Memory Fragmentation

1) Basic Principle

Causes: Memory allocation is small, and the lifespan of these small allocations is long, which leads to memory fragmentation after repeated requests
Advantages: Increases allocation speed, facilitates memory management, and prevents memory leaks
Disadvantages: A large amount of memory fragmentation can slow down the system, reduce memory utilization, and cause significant waste

2) How to Avoid Memory Fragmentation

Avoid using dynamic memory allocation functions (try to use stack space)
Allocate and free memory in the same function whenever possible
Try to allocate larger memory at once instead of repeatedly allocating small amounts
Try to request large chunks of memory in powers of 2
Avoid external fragmentation – Buddy System Algorithm
Avoid internal fragmentation – Slab Algorithm
Manage memory manually and design a memory pool

2. Buddy System Algorithm – Organizational Structure

1) Concept

Provides an efficient allocation strategy for the kernel to allocate a group of contiguous pages and effectively solves the external fragmentation problem
The allocated memory area is based on page frames

2) External Fragmentation

External fragmentation refers to memory areas that are not allocated (not belonging to any process), but are too small to be allocated to new processes requesting memory space
3) Organizational Structure
All free pages are grouped into 11 block linked lists, with each linked list containing blocks of pages that are 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024 contiguous page frames. Up to 1024 contiguous pages can be requested, corresponding to 4MB of contiguous memory

3. Buddy System Algorithm – Allocation and Recovery

1) Allocation Algorithm

Request 2^i page blocks for storage space; if the 2^i corresponding linked list has free page blocks, allocate them to the application
If there are no free page blocks, check if the 2^(i-1) corresponding linked list has free page blocks; if yes, allocate 2^i block linked list nodes to the application, and insert the 2^i block linked list nodes back into the 2^i corresponding linked list
If there are no free page blocks in the 2^(i-1) linked list, repeat step 2 until a linked list with free page blocks is found
If still none are found, return memory allocation failure

2) Recovery Algorithm

Release 2^i page block storage space, check the linked list corresponding to the 2^i page blocks to see if there are contiguous page blocks with the same physical address; if not, no need to merge

In-Depth Understanding of Linux Memory Subsystem

If yes, merge into 2^(i-1) page blocks and continue checking the next level linked list until no more merges are possible

3) Conditions

The two blocks must have the same size
They must have contiguous physical addresses
The page block sizes must be the same

4. How to Allocate Memory Over 4MB?

1) Why Limit Large Memory Allocation

The larger the allocated memory, the higher the likelihood of failure
There are few scenarios for using large amounts of memory

2) Methods to Obtain Large Memory Above 4MB in the Kernel

Modify MAX_ORDER and recompile the kernel
Pass the “mem=” parameter during kernel startup options, such as “mem=80M”, reserving part of the memory; then use
request_mem_region and ioremap_nocache to map the reserved memory into the module. This requires modifying kernel startup parameters without recompiling the kernel. However, this method does not support x86 architecture, only supports ARM, PowerPC, and other non-x86 architectures
Call alloc_boot_mem before the mem_init function in start_kernel to preallocate large memory blocks, which requires recompiling the kernel
vmalloc function, which is used by kernel code to allocate memory that is contiguous in virtual memory but not necessarily contiguous in physical memory

5. Buddy System – Anti-Fragmentation Mechanism

1) Non-Movable Pages

These pages have fixed positions in memory and cannot be moved or reclaimed
Kernel code segment, data segment, memory from kernel kmalloc(), memory occupied by kernel threads, etc.

2) Reclaimable Pages

These pages cannot be moved but can be deleted. The kernel performs page reclamation when these pages occupy too much memory or when memory is scarce

3) Movable Pages

These pages can be moved freely, and pages used by user-space applications belong to this category. They are mapped through page tables

When they move to a new location, the page table entries are updated accordingly

6. Slab Algorithm – Basic Principle

1) Basic Concept

The slab allocator used by Linux is based on an algorithm first introduced by Jeff Bonwick for the SunOS operating system
Its basic idea is to place frequently used objects in the kernel into a cache and keep them in an initially available state. For example, process descriptors are frequently requested and released in the kernel

2) Internal Fragmentation

The allocated memory space is greater than the requested memory space

3) Basic Goals

Reduce internal fragmentation caused by the buddy algorithm when allocating small blocks of contiguous memory

Cache frequently used objects to reduce the time overhead of allocating, initializing, and releasing objects

Use coloring techniques to adjust objects for better use of hardware cache

7. Structure of the Slab Allocator

Since objects are allocated and released from slabs, a single slab can move between slab lists
Slabs in the slabs_empty list are the main candidates for reclamation (reaping)
The slab also supports the initialization of generic objects, avoiding the need to initialize an object multiple times for the same purpose

8. Slab Cache

1) General Cache

The allocation of small blocks of contiguous memory provided by the slab allocator is achieved through a general cache
The objects provided by the general cache have geometrically distributed sizes, ranging from 32 to 131072 bytes.
The kernel provides two interfaces, kmalloc() and kfree(), for memory allocation and release respectively

2) Dedicated Cache

The kernel provides a complete set of interfaces for the allocation and release of dedicated caches, allocating slab caches for specific objects based on the parameters passed
kmem_cache_create() is used to create a cache for a specified object. It allocates a cache descriptor from the cache_cache general cache for the new dedicated cache and inserts this descriptor into the cache_chain list formed by cache descriptors
kmem_cache_alloc() allocates a slab in the cache specified by its parameters. Conversely, kmem_cache_free() releases a slab in the specified cache

9. Kernel Mode Memory Pool

1) Basic Principle

First, allocate a certain number of equally sized (generally) memory blocks as a reserve
When there is a new memory demand, allocate a portion of memory blocks from the pool; if the memory blocks are insufficient, continue to allocate new memory
A significant advantage of this approach is that it minimizes memory fragmentation, improving memory allocation efficiency

2) Kernel API

mempool_create creates a memory pool object
mempool_alloc allocates a function to obtain that object
mempool_free releases an object
mempool_destroy destroys the memory pool

10. User Mode Memory Pool

1) C++ Example

11. DMA Memory

1) What is DMA

Direct Memory Access is a hardware mechanism that allows peripheral devices and main memory to directly transfer their I/O data without the involvement of the system processor
2) Functions of the DMA Controller
Can issue a system hold (HOLD) signal to the CPU, requesting bus takeover
When the CPU issues an allow takeover signal, it controls the bus and enters DMA mode
Can address memory and modify address pointers, allowing read and write operations on memory
Can determine the number of bytes to be transferred in this DMA transfer and judge whether the DMA transfer has ended
Issues a DMA end signal, allowing the CPU to resume normal operation

3) DMA Signals

DREQ: DMA Request Signal. This is the signal from the peripheral device to the DMA controller requesting DMA operation
DACK: DMA Acknowledge Signal. This is the signal from the DMA controller to the peripheral device that requested DMA, indicating that the request has been received and is being processed
HRQ: Signal from the DMA controller to the CPU, requesting bus takeover
HLDA: Acknowledge signal from the CPU to the DMA controller, allowing bus takeover:

4. Memory Usage Scenarios

Has the era of out of memory passed? No, even with ample memory, one should not use it recklessly.

1. Memory Usage Scenarios

Page management
Slab (kmalloc, memory pool)
User mode memory usage (malloc, realloc file mapping, shared memory)
Memory map of the program (stack, heap, code, data)
Data transfer between kernel and user mode (copy_from_user, copy_to_user)
Memory mapping (hardware registers, reserved memory)
DMA memory

2. User Mode Memory Allocation Functions

alloca allocates memory on the stack, so no need to free
malloc allocates uninitialized memory, and programs using malloc() may run correctly at first but can encounter problems after some time as memory is reallocated
calloc initializes every byte in the allocated memory space to zero
realloc extends the size of existing memory space

a) If the current contiguous memory block is sufficient for realloc, it simply enlarges the space pointed to by p and returns the pointer to p. At this point, q and p point to the same address

b) If the current contiguous memory block is insufficient, it finds a sufficiently long space, allocates a new memory block q, copies the content pointed to by p to q, and returns q. The memory space pointed to by p is deleted

mmap maps a file or other object into memory, accessible by multiple processes

3. Kernel Mode Memory Allocation Functions

get_free_pages directly operates on page frames, suitable for allocating large amounts of contiguous physical memory
kmem_cache_alloc implements memory allocation based on the slab mechanism, suitable for frequent allocation and release of memory blocks of the same size; kmalloc is based on kmem_cache_alloc and is the most common allocation method for less than page frame size
vmalloc establishes a mapping of non-contiguous physical memory to virtual addresses, suitable when large memory is needed but continuity is not a requirement
dma_alloc_coherent is implemented based on _alloc_pages and is suitable for DMA operations; ioremap implements the mapping from known physical addresses to virtual addresses, suitable for situations where physical addresses are known, such as device drivers; alloc_bootmem reserves a segment of memory during kernel startup, which the kernel cannot see if it is less than the physical memory size, and memory management requirements are high

4. Memory Allocation with malloc

When calling the malloc function, it searches the free_chunk_list linked list for a memory block large enough to satisfy the user’s request

The main task of the free_chunk_list linked list is to maintain a linked list of free heap space buffers
If the linked list of space buffers does not find the corresponding node, it needs to extend the process’s stack space through the system call sys_brk

5. Page Fault Exception

By get_free_pages, request one or more physical pages, convert addr to the pte address in the process’s pdg mapping, and set the pte corresponding to addr to the first address of the physical page
System calls: Brk – request memory less than or equal to 128kb, do_map – request memory greater than 128kb

6. User Process Memory Access Analysis

User mode processes exclusively occupy virtual address space, and two processes can have the same virtual address
When accessing user mode virtual address space, if there is no mapped physical address, a page fault exception is issued through a system call
The page fault exception enters the kernel, allocating physical address space and establishing a mapping with the user mode virtual address

7. Shared Memory

1) Principle

It allows multiple unrelated processes to access the same part of logical memory
Transferring data between two running processes, shared memory is an extremely efficient solution
Sharing data between two running processes is an efficient method of inter-process communication, effectively reducing the number of data copies

2) shm Interface

shmget creates shared memory
shmat starts access to the shared memory and connects it to the current process’s address space
shmdt detaches the shared memory from the current process

5. Pitfalls of Memory Usage

1. C Memory Leaks

Not matching calls to new and delete in class constructors and destructors

In-Depth Understanding of Linux Memory Subsystem

Not correctly clearing nested object pointers
Not defining the base class destructor as a virtual function
If the base class pointer points to a subclass object and the base class destructor is not virtual, the subclass destructor will not be called, and the resources of the subclass will not be released correctly, causing memory leaks
Missing copy constructors; passing by value will call (copy) constructors, while passing by reference will not call them
An array of pointers to objects is not the same as an array of objects; the array stores pointers to objects, and it is necessary to release the space for each object as well as for each pointer
Missing overloaded assignment operators can also lead to memory leaks when the class size is variable, as it results in copying objects member by member

2. C Dangling Pointers

Pointer variables not initialized
Pointer freed or deleted without being set to NULL
Pointer operations exceed the variable’s scope, for example, returning a pointer to stack memory results in a dangling pointer
Accessing a null pointer (requires null check)
sizeof cannot retrieve the size of an array
Attempting to modify a constant, e.g., char p=”1234″; p=’1′

3. C Resource Access Conflicts

Multi-threaded shared variables not modified with volatile
Multi-threaded access to global variables without locking
Global variables are only valid for a single process
Multi-process writing to shared memory data without synchronization
mmap memory mapping, multi-process unsafe

4. STL Iterator Invalidations

Invalidated iterators after deletion
Adding elements (insert/push_back, etc.) or deleting elements causes invalidation of sequence container iterators

Incorrect example: deleting the current iterator will invalidate it

Correct example: when erasing an iterator, save the next iterator

5. C++11 Smart Pointers

Replace auto_ptr with unique_ptr

In-Depth Understanding of Linux Memory Subsystem

Use make_shared to initialize a shared_ptr

weak_ptr as a smart pointer assistant

(1) Principle Analysis:

In-Depth Understanding of Linux Memory Subsystem

(2) Data Structure:

In-Depth Understanding of Linux Memory Subsystem

(3) Usage:

lock() to obtain a strong reference pointer to the managed object
expired() to check whether the managed object has been released
get() to access the smart pointer object

6. C++11 Smaller, Faster, Safer

std::atomic atomic data types for multi-thread safety
std::array fixed-length array has less overhead than array and unlike std::vector, its length is fixed and cannot be dynamically expanded
std::vector vector slimming shrink_to_fit(): reduces capacity to the same size as size()
std::forward_list is a singly linked list (std::list is a doubly linked list), and in cases where only sequential traversal is needed, forward_list can save more memory and has better insertion and deletion performance than list.
std::unordered_map, std::unordered_set are unordered containers implemented with hash, with insertion, deletion, and search time complexity of O(1); using unordered containers can achieve better performance when the order of elements in the container is not a concern

6. How to View Memory

Memory usage in the system: /proc/meminfo

$cat /proc/meminfoMemTotal:        8052444 kB # Total size of all memory (RAM), minus some reserved space and kernel size. MemFree:         2754588 kB # Completely unused physical memory, lowFree+highFreeMemAvailable:    3934252 kB # Maximum available memory size for starting a new application without using swap space, calculated as: MemFree+Active(file)+Inactive(file)-(watermark+min(watermark,Active(file)+Inactive(file)/2))Buffers:          137128 kB # Cache pages occupied by block devices, including: direct read/write block devices and file system metadata, such as superblock cache pages. Cached:          1948128 kB # Indicates the cache pages occupied by normal file data. SwapCached:            0 kB # The swap cache contains anonymous memory pages that are determined to be swapped but have not yet been written to the physical swap area. These anonymous memory pages, such as those allocated by user processes, are not associated with any files; if swapping occurs, this memory will be written to the swap area. Active:          3650920 kB # Active includes active anon and active fileInactive:        1343420 kB # Inactive includes inactive anon and inactive fileActive(anon):    2913304 kB # Anonymous pages, user process memory pages are divided into two types: memory pages associated with files (such as program files, data files) and memory pages that are unrelated to files (such as process stack, memory allocated with malloc), the former is called file pages or mapped pages, the latter is called anonymous pages. Inactive(anon):   727808 kB # See aboveActive(file):     737616 kB # See aboveInactive(file):   615612 kB # See aboveSwapTotal:       8265724 kB # Total size of available swap space (swap partitions free up part of hard drive space for current programs when physical memory is insufficient)SwapFree:        8265724 kB # Current remaining swap sizeDirty:               104 kB # Size of memory that needs to be written to diskWriteback:             0 kB # Size of memory being written backAnonPages:       2909332 kB # Size of memory for unmapped pagesMapped:           815524 kB # Size of mapped devices and filesShmem:            732032 kB # Size of shared memorySlab:             153096 kB # Size of kernel data structure slabSReclaimable:      99684 kB # Size of reclaimable slabSUnreclaim:        53412 kB # Size of unreclaimable slabKernelStack:       14288 kBPageTables:        62192 kBNFS_Unstable:          0 kBBounce:                0 kBWritebackTmp:          0 kBCommitLimit:    12291944 kBCommitted_AS:   11398920 kBVmallocTotal:   34359738367 kBVmallocUsed:           0 kBVmallocChunk:          0 kBHardwareCorrupted:     0 kBAnonHugePages:   1380352 kBCmaTotal:              0 kBCmaFree:               0 kBHugePages_Total:       0HugePages_Free:        0HugePages_Rsvd:        0HugePages_Surp:        0Hugepagesize:       2048 kBDirectMap4k:      201472 kBDirectMap2M:     5967872 kBDirectMap1G:     3145728 kB

Query total memory usage: free

Query process CPU and memory usage ratio: top

In-Depth Understanding of Linux Memory Subsystem

Virtual memory statistics: vmstat

Process memory consumption ratio and sorting: ps aux –sort -rss

In-Depth Understanding of Linux Memory Subsystem

View buddy system information

The current state of the buddy system can be viewed by cat /proc/buddyinfo command

cat /proc/buddyinfo Node 0, zone      DMA     23     15      4      5      2      3      3      2      3      1      0 Node 0, zone   Normal    149    100     52     33     23      5     32      8     12      2     59 Node 0, zone  HighMem     11     21     23     49     29     15      8     16     12      2    142

View slab information

You can view it through cat /proc/slabinfo command

cat /proc/slabinfo slabinfo - version: 2.1# name            &lt;active_objs&gt; &lt;num_objs&gt; &lt;objsize&gt; &lt;objperslab&gt; &lt;pagesperslab&gt; : tunables &lt;limit&gt; &lt;batchcount&gt; &lt;sharedfactor&gt; : slabdata &lt;active_slabs&gt; &lt;num_slabs&gt; &lt;sharedavail&gt;bridge_fdb_cache       0      0     64   59    1 : tunables  120   60    0 : slabdata      0      0      0nf_conntrack_expect      0      0    240   16    1 : tunables  120   60    0 : slabdata      0      0      0nf_conntrack_ffffffff81f6f600      0      0    304   13    1 : tunables   54   27    0 : slabdata      0      0      0iser_descriptors       0      0    128   30    1 : tunables  120   60    0 : slabdata      0      0      0ib_mad                 0      0    448    8    1 : tunables   54   27    0 : slabdata      0      0      0fib6_nodes            22     59     64   59    1 : tunables  120   60    0 : slabdata      1      1      0ip6_dst_cache         13     24    320   12    1 : tunables   54   27    0 : slabdata      2      2      0ndisc_cache            1     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0ip6_mrt_cache          0      0    128   30    1 : tunables  120   60    0 : slabdata      0      0      0

Release System Memory Cache

You can release it through /proc/sys/vm/drop_caches

#To free pagecache, useecho 1 &gt; /proc/sys/vm/drop_caches#To free dentries and inodes, use echo 2 &gt; /proc/sys/vm/drop_caches#To free pagecache, dentries and inodes, useecho 3 &gt;/proc/sys/vm/drop_cache

—Copyright Statement—

Source: Geek Rebirth, Edited by: nhyilin

For academic sharing only, copyright belongs to the original author.

If there is any infringement, please contact WeChat: Eternalhui or nhyilin for deletion or modification!

—THE END—

Recommended Articles

☞ Basics of Neural Networks

☞ What is the Use of Linear Algebra?

☞ Interesting Article: Various Algorithms for Pursuing Girls

☞ The Significance of Taylor Expansion, Fourier Transform, Laplace Transform, and Z Transform

☞ Detailed Explanation and Implementation of Collaborative Filtering (CF) Algorithm

☞ Seven Regression Techniques You Should Master

In-Depth Understanding of Linux Memory Subsystem

Table of Contents

1. Understanding Linux Memory

2. Linux Memory Address Space

3. Linux Memory Allocation Algorithms

Linux Memory Management Framework

4. Memory Usage Scenarios

5. Pitfalls of Memory Usage

`#To free pagecache, useecho 1 > /proc/sys/vm/drop_caches#To free dentries and inodes, use echo 2 > /proc/sys/vm/drop_caches#To free pagecache, dentries and inodes, useecho 3 >/proc/sys/vm/drop_cache`

Leave a Comment Cancel reply

Table of Contents

1. Understanding Linux Memory

2. Linux Memory Address Space

3. Linux Memory Allocation Algorithms

Linux Memory Management Framework

4. Memory Usage Scenarios

5. Pitfalls of Memory Usage

#To free pagecache, useecho 1 &gt; /proc/sys/vm/drop_caches#To free dentries and inodes, use echo 2 &gt; /proc/sys/vm/drop_caches#To free pagecache, dentries and inodes, useecho 3 &gt;/proc/sys/vm/drop_cache

Related posts

Leave a Comment Cancel reply

`#To free pagecache, useecho 1 > /proc/sys/vm/drop_caches#To free dentries and inodes, use echo 2 > /proc/sys/vm/drop_caches#To free pagecache, dentries and inodes, useecho 3 >/proc/sys/vm/drop_cache`