In the field of high-performance programming, data copying is one of the key bottlenecks affecting system throughput. In traditional IO operations, data often needs to be transferred multiple times between user space and kernel space, accompanied by redundant copying overhead.The core goal of zero-copy technology is toreduce or eliminate unnecessary data copying, significantly improving the performance of IO-intensive programs by directly reusing memory data and optimizing memory access paths. This article will systematically review zero-copy techniques in C++, categorizing them into two main types: kernel space and user space.
1. Concept of Zero-Copy Technology
Zero-copy does not mean that no copying occurs at all, but ratherto avoid redundant data copying between user space and kernel space, while reducing the CPU’s involvement in the data transfer process. The traditional data transfer process (such as file reading and network sending) typically involves 4 copies and 2 state transitions:
-
Disk → Kernel space buffer (DMA copy);
-
Kernel space buffer → User space buffer (CPU copy);
-
User space buffer → Kernel space Socket buffer (CPU copy);
-
Kernel space Socket buffer → Network adapter (DMA copy).
Zero-copy technology optimizes the memory access mechanism by omitting the CPU copy steps in the above process, retaining only the necessary DMA copies (which do not require CPU involvement), thereby reducing CPU load, minimizing memory bandwidth usage, and enhancing program responsiveness. Its core value is particularly prominent in scenarios such as large file transfers and high-concurrency network communications.
2. Kernel-Space Zero-Copy Techniques
Kernel-space zero-copy techniques rely on system calls provided by the operating system to complete data transfers directly in kernel space, avoiding data copying between user space and kernel space. C++ programs implement zero-copy functionality by encapsulating these system calls.
2.1 mmap (Memory Mapping)
mmap (Memory Mapping) directly maps disk files or device spaces into the process’s virtual address space, allowing the process to operate on file data as if accessing ordinary memory, without copying data through read/write system calls. Its core mechanism is:
-
The operating system creates a kernel space buffer for the file and maps this buffer to the process’s virtual address space;
-
When the process reads or writes the virtual address, the operating system directly operates on the kernel buffer through page table translation, without any data copying between user space and kernel space;
-
Data synchronization is managed by the operating system (e.g., dirty page write-back), and can also be actively synchronized through msync.
Advantages and Disadvantages:
-
Advantages: Supports random access, suitable for frequent read/write of large files; reduces copying overhead, improves IO efficiency;
-
Disadvantages: The mapping process has some overhead, and the advantages are not obvious in small file scenarios; there is a risk of page faults (accessing pages not loaded into physical memory); simultaneous writes by multiple processes may lead to data races.
C++ Example:
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <iostream>
int main() {
const char* filename = "large_file.dat";
int fd = open(filename, O_RDWR);
if (fd == -1) {
perror("open failed");
return -1;
}
// Get file size
off_t file_size = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET);
// Memory mapping: file fd → process virtual address, readable and writable
void* mapped_addr = mmap(nullptr, file_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (mapped_addr == MAP_FAILED) {
perror("mmap failed");
close(fd);
return -1;
}
// Directly operate on mapped memory (no copying)
char* data = static_cast<char*>(mapped_addr);
std::cout << "First 10 bytes: " << std::string(data, 10) << std::endl;
strcpy(data + 100, "Modified by mmap"); // Directly modify file content
// Synchronize mapped area to disk
msync(mapped_addr, file_size, MS_SYNC);
// Unmap and close file
munmap(mapped_addr, file_size);
close(fd);
return 0;
}
2.2 sendfile System Call
sendfile is a zero-copy system call specifically designed for “file to network” data transfer, completing the transfer of file data to the Socket buffer directly in kernel space, without user space involvement. Its process is as follows:
-
Disk data is copied to the kernel space file buffer via DMA;
-
The kernel directly “maps” the data from the file buffer to the Socket buffer (no CPU copy);
-
Data is copied from the Socket buffer to the network adapter via DMA.
sendfile is only suitable for unidirectional transfer from “file to network” and does not support user space data modification, making it a core optimization method in scenarios such as HTTP servers.
Advantages and Disadvantages
-
Advantages: Complete transfer in kernel space, no user space copying, extremely high performance; reduces the number of state transitions;
-
Disadvantages: Only supports transfer from file to network, does not support reverse transfer or user space data processing; some systems (like early Windows) do not support it.
C++ Example
#include <sys/sendfile.h>
#include <fcntl.h>
#include <unistd.h>
#include <iostream>
#include <netinet/in.h>
#include <sys/socket.h>
int main() {
// 1. Open file
int file_fd = open("large_file.dat", O_RDONLY);
if (file_fd == -1) {
perror("open file failed");
return -1;
}
// 2. Create Socket and bind
int sock_fd = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in addr;
addr.sin_family = AF_INET;
addr.sin_port = htons(8080);
addr.sin_addr.s_addr = INADDR_ANY;
bind(sock_fd, reinterpret_cast<struct sockaddr*>(&addr), sizeof(addr));
listen(sock_fd, 1);
// 3. Accept client connection
int client_fd = accept(sock_fd, nullptr, nullptr);
if (client_fd == -1) {
perror("accept failed");
close(file_fd);
close(sock_fd);
return -1;
}
// 4. Use sendfile to transfer file (zero-copy)
off_t offset = 0;
off_t file_size = lseek(file_fd, 0, SEEK_END);
lseek(file_fd, 0, SEEK_SET);
ssize_t sent = sendfile(client_fd, file_fd, &offset, file_size);
if (sent == -1) {
perror("sendfile failed");
} else {
std::cout << "Sent " << sent << " bytes" << std::endl;
}
// Close resources
close(client_fd);
close(sock_fd);
close(file_fd);
return 0;
}
2.3 splice System Call
splice is a more general kernel-space zero-copy technique than sendfile, supporting data transfer between “two file descriptors” without requiring a user space buffer. Its core features are:
-
Data always flows in kernel space, without going through user space;
-
Supports transfer between any two file descriptors (e.g., file → pipe, pipe → Socket);
-
Relies on pipes as intermediate buffers, where data is “moved” rather than copied during the transfer.
splice addresses the limited applicability of sendfile, making it a more flexible kernel-space zero-copy solution.
Advantages and Disadvantages
-
Advantages: Supports data transfer in multiple scenarios, high flexibility; no user space copying, performance close to sendfile;
-
Disadvantages: Relies on pipes, complexity of use is higher than sendfile; some systems have limitations on transfer sizes.
C++ Example
#include <sys/splice.h>
#include <fcntl.h>
#include <unistd.h>
#include <iostream>
int main() {
// Open source and destination files
int src_fd = open("source.dat", O_RDONLY);
int dest_fd = open("dest.dat", O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (src_fd == -1 || dest_fd == -1) {
perror("open failed");
return -1;
}
// Create pipe (for splice transfer)
int pipefd[2];
if (pipe(pipefd) == -1) {
perror("pipe failed");
close(src_fd);
close(dest_fd);
return -1;
}
off_t total = 0;
off_t file_size = lseek(src_fd, 0, SEEK_END);
lseek(src_fd, 0, SEEK_SET);
// Use splice to transfer data: src_fd → pipe → dest_fd
while (total < file_size) {
// Source file → pipe write end
ssize_t len = splice(src_fd, nullptr, pipefd[1], nullptr,
file_size - total, SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
if (len == -1) {
perror("splice src to pipe failed");
break;
}
// Pipe read end → destination file
len = splice(pipefd[0], nullptr, dest_fd, nullptr, len,
SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
if (len == -1) {
perror("splice pipe to dest failed");
break;
}
total += len;
}
std::cout << "Total bytes transferred: " << total << std::endl;
// Close resources
close(pipefd[0]);
close(pipefd[1]);
close(src_fd);
close(dest_fd);
return 0;
}
3. User-Space Zero-Copy Techniques
User-space zero-copy techniques do not rely on the operating system kernel but instead use C++ language features, memory management strategies, and other methods to avoid redundant data copying within user space. The core concept is “data reuse” rather than “cross-state transfer optimization”.
3.1 Using Copy-on-Write (COW)
Copy-on-Write is a delayed copy technique: when multiple objects share the same data, a copy of the data is only created when one of the objects needs to modify it; otherwise, the original data is reused. In C++, COW manages the lifecycle of shared data through reference counting, ensuring that copying is triggered only when modifications occur, thus reducing unnecessary copying overhead.
Typical Applications
-
Early C++ standard libraries (e.g., C++03) with std::string (some implementations like GCC before 4.8);
-
Custom shared data structures (e.g., shared configurations, read-only caches).
Notes
After C++11, the COW implementation of std::string was gradually deprecated (due to thread safety issues and performance overhead in multithreaded scenarios), and it shifted to “short string optimization (SSO)”. However, COW still has application value in read-only multithreaded scenarios.
C++ Example (Custom COW String)
#include <iostream>
#include <atomic>
#include <cstring>
class CowString {
private:
struct SharedData {
std::atomic<int> ref_count; // Reference count
char* data;
size_t size;
SharedData(const char* str) : ref_count(1) {
size = strlen(str);
data = new char[size + 1];
strcpy(data, str);
}
~SharedData() { delete[] data; }
};
SharedData* shared_data;
// Copy data (triggered on write)
void copy() {
if (shared_data->ref_count == 1) return;
SharedData* new_data = new SharedData(shared_data->data);
shared_data->ref_count--;
shared_data = new_data;
}
public:
CowString(const char* str = "") : shared_data(new SharedData(str)) {}
// Copy constructor: share data, increment reference count
CowString(const CowString& other) : shared_data(other.shared_data) {
shared_data->ref_count++;
}
// Assignment operator: copy on write
CowString& operator=(const CowString& other) {
if (this == &other) return *this;
// Release current shared data
shared_data->ref_count--;
if (shared_data->ref_count == 0) {
delete shared_data;
}
// Share target data
shared_data = other.shared_data;
shared_data->ref_count++;
return *this;
}
// Write operation: trigger copy
void append(const char* str) {
copy(); // Copy on write
size_t new_size = shared_data->size + strlen(str);
char* new_data = new char[new_size + 1];
strcpy(new_data, shared_data->data);
strcat(new_data, str);
delete[] shared_data->data;
shared_data->data = new_data;
shared_data->size = new_size;
}
const char* c_str() const { return shared_data->data; }
~CowString() {
shared_data->ref_count--;
if (shared_data->ref_count == 0) {
delete shared_data;
}
}
};
int main() {
CowString s1("Hello");
CowString s2 = s1; // Share data, no copy
std::cout << s2.c_str() << std::endl;
return 0;
}
3.2 Using Move Semantics (C++11 and Above)
Move semantics is a core feature introduced in C++11, with the primary goal oftransferring ownership of an object’s resources, rather than copying the resources themselves. When an object is moved, the source object “gives up” its managed resources (such as memory, file handles), and the target object directly takes over these resources without needing to copy data. This process is entirely completed in user space and does not produce any redundant copies, making it one of the key technologies for user-space zero-copy.
The implementation of move semantics relies on:
-
Rvalue references (T&&): to identify temporary or movable objects;
-
Move constructors (T(T&& other)) and move assignment operators (T& operator=(T&& other)): to define the logic for resource transfer.
Advantages and Disadvantages
-
Advantages: Completely avoids resource copying, extremely low performance overhead; suitable for transferring container elements, passing large objects, etc.;
-
Disadvantages: The source object is in a “valid but unspecified” state after moving (should be avoided); only movable objects are supported (manual implementation of move constructors/assignment is required, or rely on the compiler to generate them automatically).
C++ Example
#include <iostream>
#include <vector>
// Custom movable large object class
class LargeObject {
private:
int* data;
size_t size;
public:
// Constructor: allocate memory
explicit LargeObject(size_t s) : size(s), data(new int[s]) {
std::cout << "LargeObject constructed (allocated " << s << " ints)\n";
}
// Move constructor: transfer resource ownership
LargeObject(LargeObject&& other) noexcept
: data(other.data), size(other.size) {
other.data = nullptr; // Source object gives up resource
other.size = 0;
std::cout << "LargeObject moved\n";
}
// Move assignment operator: transfer resource ownership
LargeObject& operator=(LargeObject&& other) noexcept {
if (this == &other) return *this;
delete[] data; // Release current resource
data = other.data;
size = other.size;
other.data = nullptr; // Source object gives up resource
other.size = 0;
std::cout << "LargeObject moved (assignment)\n";
return *this;
}
// Disable copy (avoid accidental copies)
LargeObject(const LargeObject&) = delete;
LargeObject& operator=(const LargeObject&) = delete;
// Destructor: only release resources that have not been moved
~LargeObject() {
if (data != nullptr) {
delete[] data;
std::cout << "LargeObject destroyed (freed " << size << " ints)\n";
} else {
std::cout << "LargeObject destroyed (no resource to free)\n";
}
}
size_t getSize() const { return size; }
};
int main() {
std::vector<LargeObject> vec;
// Method 1: Directly construct a temporary object (triggers move constructor)
vec.emplace_back(LargeObject(1000000));
// Method 2: Use std::move to transfer lvalue object (triggers move constructor)
LargeObject obj(2000000);
vec.push_back(std::move(obj)); // obj's resources are transferred, should not be used afterwards
std::cout << "Size: " << vec[0].getSize() << std::endl;
return 0;
}
3.3 Using Smart Pointers for Resource Management
Smart pointers (std::shared_ptr, std::unique_ptr, std::weak_ptr) avoid manual copying throughautomatic resource management while utilizing semantic features to achieve zero-copy:
-
std::shared_ptr: Shares resource ownership through reference counting, allowing multiple smart pointers to point to the same resource without copying data;
-
std::unique_ptr: Exclusively owns resources, supporting resource transfer through move semantics (no copying);
-
std::weak_ptr: Assists std::shared_ptr, avoiding circular references without affecting resource lifecycle.
The core value of smart pointers lies in ensuring safe resource release while avoiding redundant copies through resource sharing/transferring, especially suitable for large objects or scarce resources (such as file handles, network connections).
Advantages and Disadvantages
-
Advantages: Simplifies memory management, avoids memory leaks; achieves zero-copy through resource sharing/transferring; thread-safe (reference counting of std::shared_ptr is atomic operation);
-
Disadvantages: std::shared_ptr has slight reference counting overhead; std::unique_ptr cannot be copied, only moved.
C++ Example
#include <iostream>
#include <memory> // Smart pointer header
// Large object class
class BigData {
private:
int* data;
size_t size;
public:
explicit BigData(size_t s) : size(s), data(new int[s]) {
std::cout << "BigData allocated: " << s << " ints\n";
}
~BigData() {
delete[] data;
std::cout << "BigData freed: " << size << " ints\n";
}
// Disable manual copying (enforce use of smart pointers for sharing/moving)
BigData(const BigData&) = delete;
BigData& operator=(const BigData&) = delete;
void printSize() const {
std::cout << "Size: " << size << " ints\n";
}
};
int main() {
// 1. std::shared_ptr: resource sharing (zero-copy)
std::shared_ptr<BigData> ptr1 = std::make_shared<BigData>(1000000);
std::shared_ptr<BigData> ptr2 = ptr1; // Shared resource, no copy
std::cout << "Use count: " << ptr1.use_count() << std::endl; // Outputs 2
// 2. std::unique_ptr: resource transfer (zero-copy)
std::unique_ptr<BigData> ptr3 = std::make_unique<BigData>(2000000);
std::unique_ptr<BigData> ptr4 = std::move(ptr3); // Transfer resource, no copy
// ptr3 has lost resource ownership, should not be used
// 3. Smart pointers in containers (zero-copy)
std::vector<std::shared_ptr<BigData>> vec;
vec.push_back(ptr1); // Shared resource, no copy
vec.push_back(std::move(ptr4)); // Transfer unique_ptr resource, no copy
// All smart pointers will automatically release resources when they go out of scope
return 0;
}
3.4 Memory Pool / Pre-allocated Buffer
A memory pool is amanagement mechanism for pre-allocated memory: it allocates a contiguous block of memory (buffer) on the heap in advance, and subsequent object creation and destruction are completed within this area, avoiding the memory fragmentation and copying overhead caused by frequent calls to new/delete. The core manifestation of user-space zero-copy is:
-
Buffer reuse: multiple objects share the same pre-allocated memory block, without copying data;
-
Reduced allocation overhead: pre-allocation avoids the system call overhead of frequent memory requests/releases;
-
Contiguous memory access: improves CPU cache hit rate, indirectly optimizing performance.
Advantages and Disadvantages
-
Advantages: Reduces memory fragmentation, improves memory allocation/release efficiency; avoids data copying, suitable for small objects created/destroyed frequently;
-
Disadvantages: Requires manual management of memory pool size (too small leads to expansion, too large wastes memory); thread safety needs additional handling; not suitable for objects with dynamically changing sizes.
C++ Example
#include <iostream>
#include <new>
#include <cstring>
// Simple fixed-size memory pool template
template <size_t PoolSize>
class MemoryPool {
private:
char buffer[PoolSize * sizeof(T)]; // Pre-allocated buffer
T* free_list; // Free object linked list (manages reusable memory blocks)
public:
MemoryPool() {
// Initialize free list: divide buffer into multiple blocks of size T
free_list = reinterpret_cast<T*>(buffer);
T* current = free_list;
for (size_t i = 0; i < PoolSize - 1; ++i) {
// Store the address of the next block at the end of each block
*reinterpret_cast<T**>(current) = current + 1;
current++;
}
*reinterpret_cast<T**>(current) = nullptr; // End of list
}
// Allocate memory (get from memory pool, no copy)
void* allocate() {
if (free_list == nullptr) {
throw std::bad_alloc(); // Memory pool exhausted
}
void* ptr = free_list;
free_list = *reinterpret_cast<T**>(ptr); // Move to the next free block
return ptr;
}
// Deallocate memory (return to memory pool, no copy)
void deallocate(void* ptr) {
// Insert the released block at the head of the free list
*reinterpret_cast<T**>(ptr) = free_list;
free_list = reinterpret_cast<T*>(ptr);
}
// Disable copy (memory pool is singleton semantics)
MemoryPool(const MemoryPool&) = delete;
MemoryPool& operator=(const MemoryPool&) = delete;
};
// Test object (using memory pool allocation)
class SmallObject {
private:
int id;
char data[64]; // Small object data
public:
explicit SmallObject(int id) : id(id) {
memset(data, 0, sizeof(data));
std::cout << "SmallObject " << id << " constructed\n";
}
~SmallObject() {
std::cout << "SmallObject " << id << " destroyed\n";
}
// Overload operator new/delete, using memory pool
static void* operator new(size_t size) {
static MemoryPool<SmallObject, 100> pool; // Pre-allocate memory pool for 100 objects
return pool.allocate();
}
static void operator delete(void* ptr) {
static MemoryPool<SmallObject, 100> pool;
pool.deallocate(ptr);
}
};
int main() {
// Allocate objects from memory pool (no copy, reuse buffer)
SmallObject* obj1 = new SmallObject(1);
SmallObject* obj2 = new SmallObject(2);
// Release objects (return to memory pool, no copy)
delete obj1;
delete obj2;
// Reallocate, reusing previously released memory blocks
SmallObject* obj3 = new SmallObject(3);
delete obj3;
return 0;
}
3.5 Using Shared Memory
User-space shared memory is a technique formultiple processes/threads to share the same physical memory area, where data is written directly into this area without needing to copy between processes/threads. Its core mechanism is:
-
Process A creates a shared memory area and maps it to its own virtual address space;
-
Process B maps this shared memory to its own virtual address space using the same identifier (e.g., name);
-
All processes can directly read and write to the shared memory, with data modifications visible in real-time, without any copying overhead.
Unlike kernel-space mmap, user-space shared memory focuses more on “inter-process data sharing,” while mmap focuses on “file and memory mapping,” but both rely on virtual memory mechanisms to achieve zero-copy.
Advantages and Disadvantages
-
Advantages: Inter-process data transfer without copying, extremely high performance; supports large data sharing;
-
Disadvantages: Requires manual synchronization handling (e.g., using mutexes, semaphores) to avoid data races; the lifecycle of shared memory needs to be managed manually; cross-platform compatibility is poor (Linux uses shmget/shmat, Windows uses CreateFileMapping).
C++ Example (Linux Platform)
#include <iostream>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <cstring>
#include <unistd.h>
const char* SHM_KEY = "shared_memory_key";
const size_t SHM_SIZE = 4096; // Shared memory size
int main() {
// 1. Create shared memory key
key_t key = ftok(SHM_KEY, 1);
if (key == -1) {
perror("ftok failed");
return -1;
}
// 2. Create/get shared memory (permissions 644, create if not exists)
int shm_id = shmget(key, SHM_SIZE, 0644 | IPC_CREAT);
if (shm_id == -1) {
perror("shmget failed");
return -1;
}
// 3. Map shared memory to current process's virtual address space
void* shm_addr = shmat(shm_id, nullptr, 0);
if (shm_addr == reinterpret_cast(-1)) {
perror("shmat failed");
return -1;
}
// 4. Child process writes data (no copy)
pid_t pid = fork();
if (pid == 0) {
// Child process: write to shared memory
const char* msg = "Hello from child process (shared memory)";
strncpy(static_cast(shm_addr), msg, SHM_SIZE - 1);
std::cout << "Child wrote: " << static_cast<char*>(shm_addr) << std::endl;
shmdt(shm_addr); // Unmap
return 0;
} else if (pid > 0) {
// Parent process: read from shared memory
waitpid(pid, nullptr, 0); // Wait for child process to finish
char* msg = static_cast<char*>(shm_addr);
std::cout << "Parent read: " << msg << std::endl;
// 5. Unmap and delete shared memory
shmdt(shm_addr);
shmctl(shm_id, IPC_RMID, nullptr);
} else {
perror("fork failed");
shmdt(shm_addr);
return -1;
}
return 0;
}
3.6 Using String View std::string_view (C++17)
std::string_view is anon-owning string view introduced in C++17, which only stores a pointer to the original string and its length, without managing memory ownership. Its core value lies in:
-
Avoiding string copies: accessing the string does not require copying data, directly referencing the original memory;
-
Compatibility with various string types: can accept std::string, C-style strings (const char*), character arrays, etc., without type conversion copies; efficient substring operations in C: extracting substrings only modifies the pointer and length, with no copying overhead (unlike std::string::substr() which creates a copy).
The zero-copy essence of std::string_view is “view reuse”—it does not create new string objects, only providing a read-only access interface to existing strings (default read-only, if modification is needed, ensure the original string is writable).
Advantages and Disadvantages
-
Advantages: Zero-copy access to strings, extremely high performance; low memory overhead (only stores pointer + length); supports efficient substring operations; compatible with various string sources;
-
Disadvantages: Does not manage memory, must ensure the original string’s lifecycle is longer than the string_view (otherwise it may lead to dangling pointers); default read-only (modification requires caution); requires C++17 or above.
Applicable Scenarios
-
Function parameter passing (replacing const std::string&, avoiding temporary string copies);
-
Frequent substring extraction scenarios (e.g., parsing logs, protocol data);
-
Read-only access scenarios for various string types.
C++ Example
#include <iostream>
#include <string_view>
// Function parameter using string_view (zero-copy)
void processString(std::string_view sv) {
std::cout << "Processing string: " << sv << ", Data: " << sv << std::endl;
// Efficient substring extraction (no copy)
std::string_view sub_sv = sv.substr(6, 5); // From index 6, length 5
std::cout << "Substring: " << sub_sv << std::endl;
}
int main() {
// 1. Accept C-style string (no copy)
const char* c_str = "Hello C++17";
processString(c_str);
// 2. Accept std::string (no copy, only reference)
std::string str = "Hello std::string";
processString(str);
// 3. Accept character array (no copy)
char arr[] = "Hello char array";
processString(arr);
// 4. Substring operation comparison (string_view vs string)
std::string long_str = "This is a very long string for testing";
// std::string::substr(): creates new string (copy)
std::string str_sub = long_str.substr(8, 4);
std::cout << "std::string::substr() copy overhead: " << str_sub << std::endl;
// std::string_view::substr(): zero-copy, only modifies view
std::string_view sv = long_str;
std::string_view sv_sub = sv.substr(8, 4);
std::cout << "std::string_view::substr() zero-copy: " << sv_sub << std::endl;
// Note: Avoid string_view pointing to temporary objects (lifecycle issue)
std::string_view bad_sv = std::string("Temporary string").substr(0, 5);
// std::cout << bad_sv << std::endl; // Undefined behavior: temporary string is destroyed, sv points to invalid memory
return 0;
}
3.7 Using Array View std::span (C++20)
std::span is anon-owning array/container view introduced in C++20, designed similarly to std::string_view but applicable to a wider range—it can be used for any type of contiguous memory sequence (arrays, std::vector, std::array, dynamically allocated arrays, etc.). Its core features are:
-
Non-owning: only stores a pointer to the data, the number of elements, and does not manage memory;
-
Zero-copy access: directly references the original contiguous memory, with no data copying;
-
Flexible adaptation: supports dynamic sizes (std::span<T, N>);
-
Supports read and write: if the original data is writable, the span can directly modify the data (unlike string_view which is read-only by default).
The zero-copy essence of std::span is “contiguous memory view reuse,” unifying the access interface for different contiguous containers while avoiding the overhead of container copying or type conversion.
Advantages and Disadvantages
-
Advantages: Zero-copy access to contiguous memory, extremely high performance; compatible with various contiguous containers/arrays; supports read and write operations; low memory overhead (pointer + length); static size versions can be optimized at compile time;
-
Disadvantages: Does not manage memory, must ensure the original data’s lifecycle is valid; only supports contiguous memory (does not support non-contiguous containers like linked lists); requires C++20 or above.
Applicable Scenarios
-
Function parameter passing (replacing const std::vector<> and const T[], avoiding container copies);
-
Processing contiguous memory buffers (e.g., network data, file read/write buffers);
-
Unifying access logic for different contiguous containers (e.g., supporting both arrays and vectors in function interfaces).
C++ Example
#include <iostream>
#include <span>
#include <vector>
#include <array>
// Function parameter using span (zero-copy, compatible with various contiguous containers)
template <typename T>
void processBuffer(std::span<T> buf) {
std::cout << "Buffer size: " << buf.size() << ", Elements: ";
for (T elem : buf) {
std::cout << elem << " ";
}
std::cout << std::endl;
// Directly modify original data (if data is writable)
if (!buf.empty()) {
buf[0] *= 2; // Zero-copy modification
}
}
int main() {
// 1. Process std::vector (zero-copy)
std::vector<int> vec = {1, 2, 3, 4, 5};
processBuffer(vec);
std::cout << "First element: " << vec[0] << std::endl; // Outputs 2
// 2. Process std::array (zero-copy)
std::array<int, 3> arr = {6, 7, 8};
processBuffer(arr);
std::cout << "First element of array: " << arr[0] << std::endl; // Outputs 12
// 3. Process C-style array (zero-copy)
int c_arr[] = {9, 10, 11};
processBuffer(std::span(c_arr)); // Explicitly construct span
// 4. Process dynamically allocated array (zero-copy)
int* dyn_arr = new int[4]{12, 13, 14, 15};
processBuffer(std::span(dyn_arr, 4)); // Specify pointer and length
delete[] dyn_arr;
// 5. Static size span (compile-time optimization)
std::span<int, 3> static_span = arr;
std::cout << "Static span size (compile-time): " << static_span.size() << std::endl; // Outputs 3
return 0;
}
4. Conclusion
The core goal of zero-copy technology is toreduce or eliminate redundant data copying, thereby enhancing program performance—kernel-space zero-copy focuses on “cross-state copy optimization between user space and kernel space,” while user-space zero-copy focuses on “data reuse optimization within user space.” The zero-copy techniques in C++ reviewed in this article can be summarized into the following two categories and applicable scenarios:
4.1 Technology Classification and Selection Recommendations
| Technology Type | Core Technology | Applicable Scenarios | Dependency Conditions |
|---|---|---|---|
| Kernel-Space Zero-Copy | mmap | Random read/write of large files, file and memory mapping | Operating system support, C language system calls |
| sendfile | Unidirectional transfer from file to network (e.g., HTTP server) | Operating system support (mainly Linux) | |
| splice | Kernel-space transfer between any two file descriptors | Operating system support (mainly Linux) | |
| User-Space Zero-Copy | Move semantics (C++11) | Transfer of large objects, moving container elements | C++11 and above |
| Smart Pointers | Resource sharing/transferring, avoiding memory leaks | C++11 and above | |
| Memory Pool / Pre-allocated Buffer | Small objects created/destroyed frequently, fixed-size buffers | Custom implementation or third-party library | |
| Shared Memory | Large data sharing between processes | Operating system support, synchronization mechanisms | |
| std::string_view (C++17) | Read-only access to strings, substring extraction, function parameter passing | C++17 and above | |
| std::span (C++20) | Read/write of contiguous memory, unified container interface, buffer processing | C++20 and above | |
| COW (Copy-on-Write) | Read-only multithreaded scenarios, sharing read-only data | Custom implementation (standard library deprecated) |
4.2 Key Considerations
-
Lifecycle Management: Non-owning views (string_view, span), shared memory, mmap, etc., must ensure the validity of the original data/memory’s lifecycle to avoid dangling pointers or invalid memory access;
-
Thread Safety: Shared resources (shared memory, shared_ptr, COW) require manual synchronization handling (mutexes, semaphores) to avoid data races;
-
Standard Compatibility: Move semantics and smart pointers from C++11+, string_view from C++17, and span from C++20 must be selected based on the project’s compilation standards;
-
Performance Trade-offs: Some technologies have initialization overhead (e.g., memory pool, mmap), and small data scenarios may not be cost-effective, requiring testing in actual scenarios.
4.3 Technology Evolution Trends
From C++11’s move semantics and smart pointers to C++17’s string_view and C++20’s span, the core trend isto provide safer and more general non-owning views and resource management mechanisms, reducing the complexity of manual optimizations for developers. Meanwhile, kernel-space zero-copy technologies (mmap, sendfile) remain the performance cornerstone for IO-intensive programs and should be used reasonably in conjunction with operating system features.