In-Depth Analysis of Linux pstore: Mechanisms and Implementation of Persistent Storage

In-Depth Analysis of Linux pstore: Mechanisms and Implementation of Persistent Storage

1 Overview and Background of pstore

In computing systems, capturing kernel logs and crash information has always been a critical issue. When a system encounters a sudden crash, freeze, or power failure, the key log information that can reveal the root cause of the problem is often permanently lost because it is stored in volatile memory. This is akin to an airplane crash where, if the black box is also destroyed, investigators will have no way of knowing the true cause of the accident. To address this challenge, the Linux kernel introduced the pstore (Persistent Storage) mechanism, providing a reliable means of capturing data during system crashes and exceptions. The core design goal of pstore is to save critical log information to non-volatile storage media before the system crashes in the event of a fatal error (such as panic or oops), allowing this information to be read from the storage medium in file form for developers and system administrators to perform fault analysis upon the next normal system startup. Unlike persistent RAM solutions that require battery power, pstore is designed to be more flexible, supporting various backend storage devices, including but not limited to non-volatile memory, block devices, and ACPI ERST (Error Record Serialization Table). From an architectural perspective, pstore implements a platform-independent persistent storage framework, abstracting the operations of specific storage media into pluggable backend drivers. This design allows other kernel subsystems (such as logging, tracing, etc.) to use persistent storage functionality in a unified manner without needing to concern themselves with the specific details of the underlying storage media. pstore occupies a unique position in the Linux kernel, situated between the file system layer and the specific storage driver layer, providing the kernel with a storage abstraction that can reliably operate even in crash scenarios. The evolution of pstore reflects the Linux kernel’s ongoing efforts in reliability and debuggability. The initial implementation of pstore only supported persistent RAM (such as eNVRAM) as a storage medium, but it has gradually expanded to support block devices (pstore/blk), ACPI ERST, and other backends. This evolution has greatly expanded the applicability of pstore, allowing devices without dedicated persistent RAM (such as most embedded systems and low-cost servers) to benefit from crash log capture functionality. The practical application scenarios of pstore are very broad. In embedded devices and server domains, pstore is often used to capture kernel logs during system crashes, helping developers diagnose complex system stability issues. In cloud computing environments, pstore can work with virtualization technologies to provide virtual machines with crash information capture capabilities. On mobile devices, the Android system utilizes pstore’s pmsg subsystem to store user space logs, providing data support for application crash analysis. In any scenario, pstore plays the role of a “system forensic expert,” providing critical evidence for post-failure analysis.

2 Architecture and Core Components of pstore

2.1 Overall Architecture

The design of pstore adopts a typical layered architecture pattern, separating general functionality from specific storage media operations. This layered design allows pstore to flexibly support various storage backends while providing a consistent interface to the upper layers. As shown in Figure 1, the overall architecture of pstore can be divided into four main layers: frontend interface layer, core framework layer, backend driver layer, and storage media layer.

Storage Media Layer
Persistent Memory
Block Devices
ACPI ERST
Other Storage
Backend Driver Layer
RAM Backend
Block Device Backend
ERST Backend
Other Backends
Core Framework Layer
pstore_info Registration
Record Allocation and Management
File System Interface
Compression Processing
Frontend Interface Layer
dmesg/console
ftrace
pmsg
Other Loggers

Figure 1: Layered Architecture Diagram of pstore

In the frontend interface layer, pstore provides retrieval interfaces to user space through device files such as **/dev/pstore0**, allowing user space tools to read stored crash information. At the same time, this layer defines various record types, each corresponding to different kinds of information:

  • PSTORE_TYPE_DMESG: Kernel logs, including log_buf contents during panic/oops
  • PSTORE_TYPE_CONSOLE: Console output information
  • PSTORE_TYPE_FTRACE: Function tracing information
  • PSTORE_TYPE_PMSG: User space logs, commonly used in Android systems

The core framework layer is the “brain” of pstore, responsible for coordinating interactions between the frontend and backend, providing record management, resource allocation, and file system integration functionalities. This layer defines the critical <span>pstore_info</span> structure (analyzed in detail later), serving as a contract between the frontend and backend. The core framework layer also handles record compression (if compression options are configured), supporting various compression algorithms such as zlib, lzo, and lz4. The backend driver layer contains driver implementations for specific storage media. Each backend driver is responsible for implementing the operation methods defined in the <span>pstore_info</span> structure, including read, write, and erase operations. It is worth noting that different backend drivers have different characteristics and trade-offs in terms of performance, reliability, and storage capacity. For example, the RAM backend provides fast access speeds but limited capacity, while the block device backend has larger capacity but may not be reliable enough in crash scenarios. The storage media layer is the foundation of the pstore architecture, consisting of actual physical storage devices. These devices must maintain data persistence during system crashes and reboots, which is a prerequisite for the normal operation of the pstore mechanism.

2.2 Frontend Interface and Record Types

The frontend interface of pstore defines the types of information that the system can store and their formats. Each record type has a specific purpose and storage format, meeting the information preservation needs of different subsystems during system crashes. The dmesg type is undoubtedly the most important and commonly used record type, used to save the contents of the kernel ring buffer (log_buf). When a system experiences a panic or oops, the kernel’s kmsg_dump mechanism is triggered, forwarding the contents of log_buf to pstore, which then saves this information to persistent storage. The console type record focuses on capturing the output information from the system console. This is particularly useful for debugging early startup issues or kernel messages that may not be logged to dmesg. Unlike dmesg, console records typically contain more raw console output, providing another perspective on system state information. The ftrace type record reflects the deep integration of pstore with the kernel tracing mechanism. Ftrace is a built-in function tracing framework in the Linux kernel that can capture function call relationships and execution times during system runtime. When a system crashes, by saving the contents of the ftrace buffer, developers can reconstruct the function call sequence prior to the crash, which is valuable for understanding complex code execution path anomalies. The pmsg type record demonstrates pstore’s support for user space logs. In Android systems, pmsg is used to store system logs that can be written from user space processes and read and analyzed during the next startup. This extends the applicability of pstore beyond just kernel space crash information recording. To manage these different types of records, the core framework of pstore maintains a unified record management system. Each record contains some metadata, such as timestamps, reason identifiers, and sequence numbers. This metadata plays a critical role in record retrieval and display, helping users understand the context in which the records were generated. It is worth noting that pstore adopts a “best effort” strategy for record management—when storage space is limited, it may overwrite older records to make room for new ones.

2.3 Storage Backends and Platform Integration

The powerful flexibility of pstore largely comes from the diversity of its backend drivers. Backend drivers are responsible for adapting to specific storage hardware and implementing the abstract operations expected by the pstore core framework. The RAM backend is the earliest supported storage type in pstore, relying on persistent RAM (such as eNVRAM) as the storage medium. This backend typically provides extremely fast read and write speeds, making it an ideal choice in crash scenarios—speed is often a critical factor during system crashes, as any delay may result in log information not being saved. The block device backend (pstore/blk) addresses the needs of devices without persistent RAM. As described in the Linux kernel documentation, “pstore/blk addresses two common use cases that previously excluded the possibility of using pstore/ram: not all devices have a battery available to maintain regular RAM during power failures; most embedded smart devices do not have persistent RAM, which increases costs, and prefer cheaper solutions such as block devices.” The block device backend supports various storage devices, such as SSDs, eMMCs, and SD cards, greatly expanding the applicability of pstore. The ERST backend utilizes the Error Record Serialization Table (ERST) defined in the ACPI specification. ERST is a common feature in server hardware, providing a standardized storage mechanism for error records. The pstore’s ERST backend allows the kernel to leverage this hardware feature to store crash information, achieving synergy with hardware error recording mechanisms. Different backend drivers have their own characteristics in implementation, but they all interact with the core framework through the same <span>pstore_info</span> structure. This structure defines the basic operations that backend drivers must implement, such as read, write, and erase. Additionally, some backends may also provide a panic_write callback to bypass the normal I/O stack during system crashes and directly perform storage operations, thereby increasing the chances of successfully saving logs in extremely unstable environments.

2.4 Core Data Structures and Relationships

The core of the pstore architecture consists of several key data structures that define the interaction methods and data formats between various components of pstore. The most important structure is <span>pstore_info</span>, which defines the contract for storage backends and serves as a bridge between the frontend and backend.

struct pstore_info {
    struct module *owner;
    char *name;
    spinlock_t buf_lock;
    char *buf;
    size_t bufsize;
    struct mutex read_mutex;
    int flags;
    int (*open)(struct pstore_info *psi);
    int (*close)(struct pstore_info *psi);
    ssize_t (*read)(u64 *id, enum pstore_type_id *type, int *count,
            struct timespec *time, char **buf, bool *compressed,
            ssize_t *ecc_notice_size, struct pstore_info *psi);
    int (*write)(enum pstore_type_id type, enum kmsg_dump_reason reason,
            u64 *id, unsigned int part, int count, bool compressed,
            size_t size, struct pstore_info *psi);
    int (*write_buf)(enum pstore_type_id type, enum kmsg_dump_reason reason,
            u64 *id, unsigned int part, const char *buf, bool compressed,
            size_t size, struct pstore_info *psi);
    int (*erase)(enum pstore_type_id type, u64 id, int count,
            struct timespec time, struct pstore_info *psi);
    void *data;
};

Code 1: Definition of pstore_info Structure

<span>pstore_info</span> structure fields have specific purposes. The <span>name</span> field identifies the name of the backend, such as “ram” or “blk”; the <span>buf</span> and <span>bufsize</span> define the buffer used for temporary storage and its size; the <span>flags</span> indicate the types of records supported by the backend; most importantly, the set of function pointers, such as <span>read</span>, <span>write</span>, and <span>erase</span>, which are implemented by the backend driver, define how to read and write records from the storage medium. Another key data structure is <span>pstore_ftrace_record</span>, which is used to store the specific contents of ftrace records:

struct pstore_ftrace_record {
    unsigned long ip;
    unsigned long parent_ip;
    u64 ts;
};

Code 2: Definition of pstore_ftrace_record Structure

This structure captures key information for function tracing: <span>ip</span> records the address of the called function, <span>parent_ip</span> records the address of the caller, and <span>ts</span> is the timestamp. This information is sufficient to reconstruct the function call relationships, which is crucial for understanding code execution paths. pstore also uses the <span>pstore_type_id</span> enumeration to distinguish between different types of records. As mentioned earlier, this enumeration defines the various record types supported by pstore, from DMESG to MCE, CONSOLE, FTRACE, etc., with each type corresponding to a different category of information. The relationships between these data structures can be represented in Figure 2:

uses
may contain
pstore_info

+struct module *owner

+char *name

+spinlock_t buf_lock

+char *buf

+size_t bufsize

+struct mutex read_mutex

+int flags

+void *data

+int (*open)()

+int (*close)()

+ssize_t (*read)()

+int (*write)()

+int (*write_buf)()

+int (*erase)()

pstore_ftrace_record

+unsigned long ip

+unsigned long parent_ip

+u64 ts

pstore_type_id
<>

+PSTORE_TYPE_DMESG

+PSTORE_TYPE_MCE

+PSTORE_TYPE_CONSOLE

+PSTORE_TYPE_FTRACE

+PSTORE_TYPE_PPC_RTAS

+PSTORE_TYPE_PPC_OF

+PSTORE_TYPE_PPC_COMMON

+PSTORE_TYPE_PMSG

+PSTORE_TYPE_PPC_OPAL

+PSTORE_TYPE_UNKNOWN

Figure 2: Relationship Diagram of Core Data Structures in pstore

3 Workflow and Lifecycle of pstore

3.1 Initialization and Registration Process

The initialization of pstore is a multi-stage process involving the collaborative setup of the core framework, backend drivers, and frontend interfaces. The complexity of this process arises from the need for pstore to be ready during the early stages of kernel startup to capture any early crashes that may occur. During system startup, the pstore core framework first initializes, creating internal data structures and registering the corresponding file system interfaces. This process occurs early in the kernel initialization sequence, ensuring that when backend drivers begin to register, the core framework is already in a ready state. It is worth noting that the initialization time point of pstore is earlier than most kernel subsystems, reflecting its foundational role in system reliability. The registration of backend drivers is a critical step in the initialization process. Each backend driver (whether RAM, block device, or ERST) will call the <span>pstore_register</span> function when it detects supported hardware, passing a configured <span>pstore_info</span> structure instance. This registration process is essentially a “handshake” between the backend driver and the core framework, establishing an operational contract between the two. During the registration process, the core framework verifies the capabilities and parameters provided by the backend driver, including buffer sizes and supported record types. If the verification passes, the core framework adds this backend driver to the list of available drivers and decides whether to set it as the currently active backend based on system configuration. Interestingly, pstore supports multiple backends to coexist, but typically only one backend is activated for actual crash record storage. Once the backend is successfully registered, pstore attempts to read previously saved records from that storage during the last crash. These records are cached in kernel memory and exposed to user space when the pstore file system is mounted. This “read-cache-expose” mechanism ensures that the information saved from the previous crash can be accessed during the next normal startup. The final stage of initialization is the establishment of file system interfaces. pstore provides interfaces to user space through device files such as <span>/dev/pstore0</span>, allowing user space tools (such as the pstore service in systemd) to automatically read and archive these crash information at startup. The elegance of the entire initialization process lies in its fault tolerance—even if a particular backend driver fails to initialize, it does not affect the normal operation of other backend drivers and the overall system startup process.

3.2 Runtime Operations and Recording Process

During normal system operation, pstore, while not actively participating in system operations, remains in a “listening” state, ready to respond to any events requiring persistent storage. The runtime operations of pstore can be divided into three main categories: active recording, passive capture, and space management. Active recording refers to recording operations explicitly triggered by other kernel subsystems. For example, the ftrace subsystem may periodically or upon specific events write the contents of its buffer to pstore. This recording method is typically used to capture specific code execution paths or performance data, providing a basis for system behavior analysis. Passive capture is the primary function of pstore, triggered by system exceptions. When the kernel detects an unrecoverable error (such as oops) or a severe system failure (such as panic), the kmsg_dump mechanism is triggered, which in turn calls the registered pstore backend to save critical information. This process involves collaboration among multiple kernel subsystems, as illustrated in the timing diagram in Figure 3.

Hardware Storage pstore Backend pstore Core kmsg_dump Kernel Subsystem User Process
Normal Operation Exception Occurrence Next Normal Startup
pstore_register() Successful Registration
kmsg_dump() pstore_write() Allocate Record Space Compression Processing (Optional)
write() Operation Persistent Storage Record Storage Confirm Write Success Return Result
Open /dev/pstore read() Operation Read Record from Storage Return Record Data Return Record Provide Record Data Save or Process Record

Figure 3: Timing Diagram of pstore Record Writing and Retrieval

Space management is an important aspect of pstore’s runtime. Since persistent storage space is often limited, pstore needs to carefully manage this scarce resource. Common space management strategies include circular overwriting and static partitioning. The circular overwriting strategy overwrites the oldest records when storage space runs out, similar to the behavior of log file systems. This strategy ensures that the latest records are always preserved in storage space, but may lose important historical information. The static partitioning strategy reserves fixed-size space for each record type, avoiding interference between different types of records, but may result in low space utilization. The runtime operations of pstore also consider the balance between system stability and performance. For example, during normal operation, pstore may use standard I/O paths for storage operations to ensure good interoperability with other subsystems. However, in the event of a system crash, pstore may switch to direct or bare-metal I/O paths, bypassing the standard kernel I/O stack to increase the chances of successfully saving logs in extremely unstable environments.

3.3 Crash Capture Process

Crash capture is the core function of pstore and the original intent of its design. When a system encounters an unrecoverable fatal error, pstore acts as the “system’s last record keeper,” saving critical information before the system completely crashes. The time window for this process is extremely limited, and the execution environment is highly unstable, so the crash capture process of pstore must be as simple, reliable, and fast as possible. The trigger for crash capture usually begins with the detection of a kernel panic or oops. When the kernel detects such a serious error, it notifies all registered dumpers, including pstore, through the <span>kmsg_dumper</span> mechanism. It is worth noting that pstore’s priority in kmsg_dumper is usually set high to ensure it gets the opportunity to execute before the system completely crashes. A key decision during the crash capture process is determining what content to save. Due to storage space and time constraints, pstore cannot save all system information. Therefore, it needs to intelligently select the most valuable information for preservation. For the dmesg type, pstore typically saves the entire content of the kernel ring buffer, which usually contains the error messages that led to the crash and previous context information. For the ftrace type, pstore may only save the most recent function call tracing records. During the crash capture process, the backend’s panic_write callback plays a crucial role. Unlike normal write operations, panic_write attempts to bypass the conventional kernel I/O stack and directly interact with the storage hardware. This “shortcut” reduces the steps that may fail in an unstable system, increasing the probability of successful preservation. This process can be likened to emergency procedures in an emergency room—bypassing the usual registration and triage steps to perform critical life-saving measures directly. The final challenge of crash capture is storage consistency. During normal operation, pstore can use standard transactional mechanisms to ensure the consistency of storage operations. However, in crash scenarios, these mechanisms may not be available. Therefore, pstore typically adopts a “best effort” strategy, writing records directly to the storage medium without complex consistency checks. Although this may lead to occasional record corruption, considering the nature of crash records—having some records is better than having none at all—this trade-off is reasonable.

4 Data Storage and Retrieval Mechanisms of pstore

4.1 Storage Space Management and Partitioning Strategies

The management strategy of pstore for persistent storage space directly affects its reliability and efficiency. Since persistent storage media typically have limited capacity and varying access characteristics, pstore needs to adopt intelligent space management schemes to balance multiple needs: storage capacity, access speed, reliability, and wear leveling. Zone management is a core concept in pstore’s space management, especially in pstore/blk and zone-based storage backends. Each zone is typically divided into fixed-size blocks for storing specific types of records. This zoned management approach has several advantages: first, it avoids interference between different types of records, ensuring that critical records (such as dmesg) are not overwritten by less important records; second, it simplifies space management and retrieval logic, as each record type can be managed independently in its dedicated zone. The <span>pstore_zbackend</span> structure plays an important role in backends that support compression, defining the operations for compression and decompression:

struct pstore_zbackend {
    int (*compress)(const void *in, void *out, size_t inlen, size_t outlen);
    int (*decompress)(void *in, void *out, size_t inlen, size_t outlen);
    void (*allocate)(void);
    void (*free)(void);
    const char *name;
};

Code 3: Definition of pstore_zbackend Structure

The determination of zone sizes and layouts typically occurs during the initialization process of the backend driver. The driver calculates the optimal zone partitioning scheme based on the total capacity of the storage medium and system configuration parameters. For example, in a system with 64KB of persistent storage, pstore may partition the storage space as follows: 32KB for dmesg records, 16KB for console records, and 16KB for ftrace records. This proportional allocation reflects the relative importance and expected volume of different types of records. Storage block management is another important aspect of space management. For block device-based pstore backends, the storage space is divided into fixed-size blocks, typically aligned with the block size of the underlying block device. Each storage block can accommodate one or more records, depending on the size and compression rate of the records. The block management strategy needs to address fragmentation and wear leveling issues, especially on flash-based storage media. pstore uses metadata to track the attributes and locations of each record. This metadata is typically stored alongside the record data and is used during retrieval to reconstruct the record context. Key metadata fields include: record type, timestamp, size, compression flag, and sequence number. Efficient management of metadata is crucial for the correct retrieval and interpretation of records. In cases of insufficient space, pstore must decide how to overwrite existing records to free up space. Common strategies include: circular overwriting (overwriting the oldest records), priority overwriting (prioritizing the retention of high-importance records), and type-specific overwriting (only overwriting within the same type of records). Each strategy has its pros and cons and is suitable for different use cases. Circular overwriting is simple and predictable but may lead to the accidental loss of high-importance records. Priority overwriting can protect critical records but is complex to implement and may lead to low-priority records being “starved”. Type-specific overwriting provides isolation between different record types but may result in low space utilization.

4.2 Data Read and Write Process and Compression Handling

The data read and write process of pstore is specially optimized to meet the unique needs of crash capture scenarios. These optimizations balance the different requirements during normal operation and crashes, achieving a careful balance between performance and reliability. The write process differs significantly between normal operation and crash scenarios. During normal operation, pstore can use standard kernel I/O interfaces, such as block device layers or file system interfaces. These interfaces provide high-level functionalities such as caching, buffering, and error handling, but may be unreliable during system crashes. Therefore, during the crash capture process, pstore switches to direct write mode, bypassing higher-level abstractions and directly interacting with the storage hardware. One important consideration during the write process is atomicity. Ideally, each record’s write should be atomic, meaning it is either fully written or not changed at all. However, achieving true atomic writes in crash scenarios may not be realistic. Therefore, pstore typically adopts a “best effort” strategy, attempting to ensure the integrity of records as much as possible, but accepting partially written records in extreme cases. During retrieval, these partial records can be identified and filtered through magic numbers or checksums. Compression handling plays an important role during the write process, especially for storage media with limited capacity. pstore supports various compression algorithms, including zlib, lzo, and lz4. Each algorithm has different trade-offs between compression ratio and speed: zlib typically offers a higher compression ratio but is slower, lzo has a lower compression ratio but is extremely fast, and lz4 strikes a good balance between the two. Compression configuration is typically determined at compile time through Kconfig options, such as <span>CONFIG_PSTORE_ZLIB_COMPRESS</span>, <span>CONFIG_PSTORE_LZO_COMPRESS</span>, and <span>CONFIG_PSTORE_LZ4_COMPRESS</span>. At runtime, pstore selects the appropriate compression algorithm based on configuration and available working memory. It is worth noting that in crash scenarios, overly complex compression algorithms may reduce the probability of successful preservation due to excessive CPU time consumption, so simpler compression algorithms may be more suitable. The read process is relatively straightforward but also requires careful design. When the system starts normally, pstore scans the persistent storage area to read all valid records. This process includes: parsing metadata, verifying integrity (through checksums or magic numbers), decompressing data (if the records were compressed), and caching the records in kernel memory. A key aspect of the read process is the order of record retrieval. pstore typically returns records in the order of their generation time, ensuring that user space tools can process crash information in chronological order. For multiple records generated at the same time (such as dmesg and ftrace saving during the same crash), pstore ensures that related records are processed together through sequence numbers or type priorities. Error handling during the data transformation process is another important component of the read and write mechanism. During writing, if compression fails, pstore can choose to store the original uncompressed data; similarly, during reading, if decompression fails, pstore will attempt to mark the record as corrupted rather than completely rejecting it. This lenient error handling strategy aligns with pstore’s “best effort” design philosophy, providing valuable information even when partial data is available.

4.3 File System View and User Space Interface

pstore exposes stored records to user space through file system interfaces, allowing user space tools to access crash information using standard file operations. This design choice simplifies the development of user space tools, as they do not require special system calls or driver interfaces. The structure of the file system view is carefully designed to present stored records in an intuitive manner. In the <span>/dev/pstore</span> directory, each record typically appears as a separate file, with the filename encoding key metadata about the record. A typical filename format may include: record type, timestamp, sequence number, and reason code. For example, a dmesg record may be named <span>dmesg-pstore-2025-01-01-12-00-00-1</span>, which includes the record type (dmesg), backend (pstore), timestamp (January 1, 2025, 12:00), and sequence number (1). The content of the file is the actual data of the record, which may be plain text (such as dmesg and console records) or binary data (such as ftrace records). For binary records, user space tools need to understand the format to parse it correctly. For example, ftrace records consist of a sequence of <span>pstore_ftrace_record</span> structures, each containing a function call tracing item. The design of the user space interface considers simplicity and consistency. User space tools only need to open files in the <span>/dev/pstore</span> directory, read the contents, and optionally delete the files to free up space. This interface maintains consistency with standard UNIX file semantics, allowing existing text processing tools (such as grep, awk, and sed) to be directly used for analyzing record contents. A typical use case is automatically processing pstore records at system startup. Modern Linux distributions often include a systemd service (such as <span>systemd-pstore</span>) that automatically runs early in the system startup. This service checks the <span>/dev/pstore</span> directory, archives any found records to system logs (such as <span>/var/log/pstore</span>), and then erases the records to free up space for future crashes. This process is fully automated, requiring no administrator intervention. Another important aspect of the pstore interface is access control. Since crash records may contain sensitive information (such as memory addresses, kernel symbols, and possible user data), pstore controls access through standard file permission mechanisms. Typically, only privileged users (root) can read pstore records, preventing ordinary users from accessing potentially security-sensitive information. The record deletion semantics are a subtle but important aspect of the user space interface. When user space tools delete pstore files, it not only removes the files from the file system view but also instructs the underlying backend driver to erase the corresponding persistent storage area. This design ensures that storage space can be reused while maintaining the consistency of the file system abstraction. Without proper erasure, storage space would eventually run out, and new crash records could not be saved.

5 Application Examples and Practical Guide for pstore

5.1 Configuration and Usage Guide

Correctly configuring pstore is the first step to ensuring its proper operation. The kernel configuration options related to pstore are mainly located under “File systems” → “Pstore configuration”. Depending on system requirements and hardware capabilities, administrators need to select the appropriate combination of frontends and backends. Configuration options can be divided into several categories: frontend selection, backend selection, compression options, and debugging support. Frontend selection determines what types of information pstore can capture, with common options including:

  • <span>CONFIG_PSTORE_DMESG</span>: Enable kernel log capture
  • <span>CONFIG_PSTORE_CONSOLE</span>: Enable console output capture
  • <span>CONFIG_PSTORE_FTRACE</span>: Enable function tracing capture
  • <span>CONFIG_PSTORE_PMSG</span>: Enable user space log capture

The backend selection depends on the available hardware resources. For devices with persistent RAM, <span>CONFIG_PSTORE_RAM</span> is the best choice; for standard block devices, <span>CONFIG_PSTORE_BLK</span> provides general support; for server hardware supporting ACPI ERST, <span>CONFIG_PSTORE_ERST</span> can leverage hardware features. Compression options include <span>CONFIG_PSTORE_ZLIB_COMPRESS</span>, <span>CONFIG_PSTORE_LZO_COMPRESS</span>, and <span>CONFIG_PSTORE_LZ4_COMPRESS</span>, etc. When selecting a compression algorithm, a trade-off between compression efficiency and speed needs to be made: for systems with very limited storage space, algorithms with high compression ratios (such as zlib) are more suitable; for systems that prioritize fast saving during crashes, faster algorithms (such as lzo) are better choices. In addition to compile-time configuration, pstore also supports various runtime parameters, which can be specified through kernel command lines or module parameters. For the pstore/blk backend, important parameters include:

  • <span>blkdev</span>: Specify block device identifier
  • <span>kmsg_size</span>: dmesg record area size
  • <span>max_reason</span>: Limit the maximum reason codes for dumps

For example, the following kernel command line parameters configure pstore to use a specified block device and reserve 64KB space for dmesg:

pstore_blk.blkdev=/dev/sdb1 pstore_blk.kmsg_size=64k

Verifying the configuration is a key step to ensure pstore operates correctly. After the system starts, administrators can check the kernel logs for initialization information related to pstore. Successful initialization typically displays the detected backends and allocated resource sizes. For example:

[ 0.009879] ACPI: Reserving ERST table memory at [mem 0x6db7c000-0x6db7c22f]
[ 23.571326] ERST: Error Record Serialization Table (ERST) support is initialized

Another verification method is to check whether the <span>/dev/pstore</span> directory exists and its contents. If the system is configured correctly, this directory should exist (though it may be empty) even without any crashes. Additionally, administrators can check the kernel configuration through <span>/proc/config.gz</span> or verify the activation status of pstore at runtime by checking <span>dmesg | grep pstore</span>.

5.2 Simple Implementation Example

To gain a deeper understanding of how pstore works, we can demonstrate its core implementation mechanism through a simplified pstore backend example. This example is based on the RAM backend but has been significantly simplified to highlight key concepts. Below is a minimal pstore backend implementation:

#include <linux/module.h>
#include <linux/pstore.h>
#include <linux/slab.h>
#include <linux/io.h>

#define SIMPLE_PSTORE_SIZE 65536  /* 64KB persistent storage */

static char *simple_pstore_buffer;
static struct pstore_info simple_psi;

/* Backend read operation */
static ssize_t simple_pstore_read(u64 *id, enum pstore_type_id *type,
                 int *count, struct timespec *time,
                 char **buf, bool *compressed,
                 ssize_t *ecc_notice_size,
                 struct pstore_info *psi)
{
    /* Simplified read implementation - return data from buffer */
    *buf = simple_pstore_buffer;
    return SIMPLE_PSTORE_SIZE;
}

/* Backend write operation */
static int simple_pstore_write(enum pstore_type_id type,
                enum kmsg_dump_reason reason,
                u64 *id, unsigned int part,
                int count, bool compressed,
                size_t size, struct pstore_info *psi)
{
    /* Simplified write implementation - directly copy data to buffer */
    if (size > SIMPLE_PSTORE_SIZE)
        return -ENOSPC;
        
    /* In actual implementation, there would be more complex storage management */
    memcpy(simple_pstore_buffer, psi->buf, size);
    return 0;
}

/* Backend erase operation */
static int simple_pstore_erase(enum pstore_type_id type, u64 id,
                int count, struct timespec time,
                struct pstore_info *psi)
{
    /* Simplified erase implementation - zero out buffer */
    memset(simple_pstore_buffer, 0, SIMPLE_PSTORE_SIZE);
    return 0;
}

/* Initialization function */
static int __init simple_pstore_init(void)
{
    int ret;
    
    /* Allocate persistent storage buffer */
    simple_pstore_buffer = kzalloc(SIMPLE_PSTORE_SIZE, GFP_KERNEL);
    if (!simple_pstore_buffer)
        return -ENOMEM;
    
    /* Set up pstore info structure */
    simple_psi.owner = THIS_MODULE;
    simple_psi.name = "simple_pstore";
    simple_psi.buf = kmalloc(4096, GFP_KERNEL); /* Temporary buffer */
    if (!simple_psi.buf) {
        ret = -ENOMEM;
        goto free_buffer;
    }
    simple_psi.bufsize = 4096;
    simple_psi.flags = PSTORE_FLAGS_DMESG;
    simple_psi.read = simple_pstore_read;
    simple_psi.write = simple_pstore_write;
    simple_psi.erase = simple_pstore_erase;
    
    /* Register to pstore core */
    ret = pstore_register(&amp;simple_psi);
    if (ret) {
        pr_err("Failed to register simple pstore backend\n");
        goto free_buf;
    }
    
    pr_info("Simple pstore backend registered\n");
    return 0;

free_buf:
    kfree(simple_psi.buf);
free_buffer:
    kfree(simple_pstore_buffer);
    return ret;
}

/* Cleanup function */
static void __exit simple_pstore_exit(void)
{
    pstore_unregister(&amp;simple_psi);
    kfree(simple_psi.buf);
    kfree(simple_pstore_buffer);
    pr_info("Simple pstore backend unregistered\n");
}

module_init(simple_pstore_init);
module_exit(simple_pstore_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Simple pstore backend example");

Code 4: Simplified pstore Backend Implementation Example

This simplified example demonstrates the basic structure of a pstore backend: allocating storage space, populating the <span>pstore_info</span> structure, implementing necessary operations (read, write, erase), and then registering with the pstore core. While actual backend implementations would be more complex (handling multiple records, metadata management, error recovery, etc.), this example captures the essential design pattern. Testing and validation are important aspects of pstore implementation. To test pstore functionality, administrators can deliberately trigger a kernel oops to generate test records:

echo c > /proc/sysrq-trigger

This command will trigger a safe system oops, causing the kernel to generate a crash record and (if configured correctly) save it through pstore. After the system restarts, checking the <span>/dev/pstore</span> directory should reveal the newly generated crash record files. User space integration is the final step in making pstore practical. Below is a simple user space script example for automatically extracting and processing pstore records:

#!/bin/bash

PSTORE_DIR="/dev/pstore"
LOG_DIR="/var/log/pstore"
DATE=$(date +%Y%m%d-%H%M%S)

# Create log directory
mkdir -p $LOG_DIR

# Process all pstore records
for record in $PSTORE_DIR/*; do
    if [ -f "$record" ]; then
        filename=$(basename $record)
        cp "$record" "$LOG_DIR/${DATE}-${filename}"
        echo "Saved record: $filename"
        # Delete original record to free space
        rm -f "$record"
    fi
done

# Optional: Add record information to system logs
logger -t pstore "Processed pstore records at $DATE"

Code 5: Example of a pstore Record Processing Script

This script demonstrates the basic process of handling pstore records: copying records from <span>/dev/pstore</span> to permanent storage (such as <span>/var/log/pstore</span>), then deleting the original records to free up space, and finally logging the processing event to the system logs. In actual deployments, similar functionality is typically implemented by the pstore service in systemd.

5.3 Debugging Tips and Common Issues

Debugging pstore often needs to be conducted under the extreme conditions of system crashes, which adds complexity to the debugging process. Mastering effective debugging techniques and understanding common issues are crucial for ensuring pstore operates reliably. Debugging techniques can be categorized into several areas: configuration verification, runtime monitoring, and post-analysis. Configuration verification techniques include:

  • • Check kernel configuration to ensure all necessary pstore options are enabled
  • • Verify that kernel command line parameters are correctly passed to the pstore backend
  • • Check system logs for pstore initialization-related messages
  • • Confirm that hardware devices (such as persistent RAM or block devices) have been correctly detected and configured

Runtime monitoring is valuable for understanding pstore’s behavior. Although pstore is primarily active during crashes, monitoring its runtime status can help identify potential issues. Useful monitoring points include:

  • • Check the reservation status of persistent storage areas through <span>/proc/iomem</span>
  • • Monitor kernel logs for messages related to pstore
  • • Check system memory usage to ensure pstore buffers are correctly allocated

Common issues and their solutions include:

  1. 1. pstore did not save crash records: This is often due to configuration errors or hardware issues. Check steps include: confirming that the pstore backend has been correctly registered; verifying that the storage area is writable; checking if records were rejected due to insufficient storage space.
  2. 2. Records are incomplete or corrupted: This may indicate storage medium issues or concurrent access conflicts. Solutions include: increasing storage area size; checking hardware error logs; verifying that locking mechanisms are correctly implemented.
  3. 3. Performance issues: During normal operation, pstore should not introduce significant performance overhead. If performance degradation is observed, possible causes include: excessively frequent ftrace records; compression algorithms consuming too much power; high access latency of storage media.
  4. 4. Records not correctly retrieved at startup: This is usually related to user space tools. Check <span>/dev/pstore</span> directory permissions; verify that the systemd-pstore service is running correctly; check system logs for errors during record processing.

An effective method for debugging pstore is to use dynamic debugging and tracepoints. For example, dynamic debugging messages related to pstore can be enabled with the following command:

echo -n 'file pstore* +p' > /sys/kernel/debug/dynamic_debug/control

For deeper issues, it may be necessary to analyze the raw contents of the storage medium. This usually requires backend-specific tools and methods. For example, for RAM backends, the raw contents can be accessed through the <span>/sys/kernel/debug/pstore</span> interface; for block device backends, the <span>hexdump</span> or <span>dd</span> commands can be used to directly inspect the relevant areas of the block device. Performance optimization is an advanced topic in pstore debugging. For systems requiring high performance, the following optimization measures can be considered:

  • • Adjust the storage area layout, placing frequently accessed record types in faster storage areas
  • • Choose more suitable compression algorithms, balancing between compression ratio and speed
  • • Adjust buffer sizes to reduce memory allocation overhead
  • • Weigh the trade-offs between reliability and performance, deciding whether to use more aggressive but potentially unreliable write methods during crashes

6 Conclusion

As a persistent storage mechanism in the Linux kernel, pstore has become a cornerstone component of reliability and debuggability in modern Linux systems. Through an in-depth analysis of pstore, we can draw several key conclusions, assess its technical value, and look forward to its future development directions. From an architectural design perspective, the success of pstore stems from its clear layered model and flexible backend abstraction. This design allows pstore to adapt to diverse hardware environments, from resource-constrained embedded devices to high-performance server systems. The separation of frontend interfaces from backend implementations enables pstore to uniformly handle various types of records while providing customized support for different storage media. The core value of pstore is reflected in multiple dimensions. In terms of system reliability, it provides the last line of diagnostic defense, preserving critical information before a system completely crashes. In terms of development efficiency, it significantly shortens the diagnosis time for complex system issues, providing direct evidence of problems occurring. In operations and maintenance, it enables problem diagnosis in production environments, even in difficult-to-reproduce sporadic crash scenarios. Compared to other kernel debugging mechanisms (such as kdump and netconsole), pstore has unique advantages. Compared to kdump, pstore is more lightweight, requiring no additional reserved memory, and can operate without remote storage. Compared to netconsole, pstore does not rely on network stability and can still function when the network subsystem itself crashes. Table 1 compares the characteristics of pstore with related technologies:

Table 1: Comparison of pstore with Related Technologies

Feature pstore kdump netconsole
Setup Complexity Low Medium-High Medium
Resource Requirements Low High (reserved memory) Low
Network Dependency None Optional Yes
Types of Captured Information Flexibly Configurable Complete Memory Dump Console Output
Storage Location Local Persistent Storage Local or Remote Remote

The applicability scenarios of pstore include but are not limited to: on-site fault diagnosis of embedded devices, headless management of server systems (without displays and keyboards), debugging of virtual machines in cloud computing environments, and any scenario requiring the capture of early startup issues or difficult-to-reproduce kernel anomalies. From an implementation perspective, pstore demonstrates best practices in extreme condition programming within the Linux kernel. Its “best effort” design philosophy, minimal dependency principles during crashes, and pragmatic attitude towards storage consistency provide valuable references for similar infrastructure components.

Leave a Comment