In-Depth Understanding of C++ Happens-Before: A Must for Advanced Concurrent Programmers

1. Introduction: Why is Happens-Before Necessary?

In multithreaded programs, “statement order” ≠ “execution order”.Modern CPUs and compilers can reorder instructions as long as the results in a single thread remain unchanged, allowing for free optimization.However, in concurrent scenarios, this can lead to serious issues:

bool ready = false;int data = 0;
void writer() {    data = 42;    ready = true;}
void reader() {    while (!ready) ; // busy wait    std::cout << data << std::endl;}

You might think this code will definitely print 42, but it could actually output 0.The reason is that the compiler might execute ready = true earlier, or the CPU might read the write cache before it has been synchronized with another thread.To define what is meant by “visible order”, C++ introduces the happens-before semantics.

2. Core Definition of Happens-Before

In the C++ memory model, there are three core relationships:

Name Scope Meaning
sequenced-before Within the same thread Logical order of program statements (the compiler can reorder, but the result is equivalent)
synchronizes-with Across threads (synchronization events) Indicates a synchronization relationship across threads, making the results of operations in one thread visible in another thread and establishing a clear execution order.
happens-before Global (including across threads) A happens-before B means the result of A is visible to B and ordered

Definition:

If event A sequenced-before B, then A happens-before B (within a thread).If A synchronizes-with B, then A happens-before B (across threads).If A happens-before B, and B happens-before C, then A happens-before C (transitive).

For official links, please refer to Chapter 8 “8. Further Reading” of this article.

3. Synchronization Relationship (Synchronizes-With)

The main cross-thread synchronization mechanism provided by C++ is atomic operations (std::atomic).For example:

std::atomic<bool> ready{false};int data = 0;
void writer() {    data = 42;    ready.store(true, std::memory_order_release);}
void reader() {    while (!ready.load(std::memory_order_acquire))        ;    std::cout << data << std::endl;}

How it works:

1. In writer, store(…, memory_order_release) — release operation.2. In reader, load(…, memory_order_acquire) — acquire operation.3. If the value read is true, then: ready.store(release) synchronizes-with ready.load(acquire) establishes a happens-before relationship: writing data in writer → reading in reader

Result guarantee: the reader will definitely see data == 42. Illustrated as follows:

Thread 1 (writer)             Thread 2 (reader)
------------------            ------------------
data = 42;                    while (!ready) ;
ready.store(true, release);   if (ready.load(acquire))                                   // data == 42 guaranteed visible

4. Overview of Memory Order Models

Memory Order Description Typical Scenarios
<span>memory_order_relaxed</span> No ordering constraints, only guarantees atomicity Counters, statistical variables
<span>memory_order_acquire</span> Prevents subsequent operations in this thread from being reordered to the front during reads Used with release for inter-thread data synchronization
<span>memory_order_release</span> Prevents preceding operations from being reordered to the back during writes Used with acquire for inter-thread data synchronization
<span>memory_order_acq_rel</span> Combines acquire + release effects Atomic read-modify-write (RMW) operations
<span>memory_order_seq_cst</span> Strongest ordering guarantee, global total order Default semantics

5. Example: Happens-Before Established by Release/Acquire

#include <atomic>#include <thread>#include <iostream>
std::atomic<int> flag{0};int data = 0;
void writer() {    data = 100;    flag.store(1, std::memory_order_release);}
void reader() {    while (flag.load(std::memory_order_acquire) != 1)        ;    std::cout << "data = " << data << std::endl;}
int main() {    std::thread t1(writer);    std::thread t2(reader);    t1.join();    t2.join();}

Output:

data = 100

It guarantees that 0 will not be output.Because: flag.store(release) synchronizes-with flag.load(acquire) → the write of data is visible to the reader.

6. Error Example: Data Race Lacking Happens-Before

#include <thread>#include <iostream>
bool ready = false;int data = 0;
void writer() {    data = 42;    ready = true; // ordinary variable, no release semantics}
void reader() {    while (!ready) ;    std::cout << data << std::endl; // may output 0!

There is no synchronizes-with relationship here, and the reading result of the reader thread is undefined (UB), which may output 0.

7. Engineer’s Perspective: Summary of Happens-Before Practical Experience

In multithreaded systems, happens-before is not an abstract theory, but a practical guideline for engineers to determine “whether data is visible”.Mastering it allows you to identify race conditions in your code and avoid unnecessary locks and performance waste.

Summary of Multithreaded Happens-Before Practical Experience

Means Type Memory Order / Characteristics Performance / Engineering Advice
std::atomic (release → acquire) or atomic_thread_fence(release/acquire) Lightweight release / acquire Performance is better than locks, commonly used in inter-thread communication or lock-free data structures.
std::atomic (memory_order_acq_rel, read-modify-write) Lightweight acq_rel Performance is better than locks, ensuring visibility before reads + visibility after writes, suitable for lock-free algorithms
std::atomic (memory_order_seq_cst, strict order) Lightweight seq_cst Performance is better than locks, providing global order guarantee, suitable for complex lock-free algorithms
std::mutex: unlock → lock Heavyweight Implicit acquire-release System call level overhead, higher cost, safe and reliable, clear semantics, suitable for strong consistency, emphasizing safety and maintainability rather than extreme performance scenarios
std::condition_variable: notify → wait return Heavyweight Must be used with locks, implicit acquire-release High system call overhead, not suitable for low-latency critical paths, suitable for scenarios where threads need to wait for specific conditions or events (e.g., producer-consumer queues), providing reliable synchronization.
std::thread: start → thread body execution Heavyweight Implicit release High system call overhead, understanding the principles is sufficient
std::thread::join(): thread end → join return Heavyweight Implicit acquire Automatic synchronization after join, clear semantics, but involves significant overhead related to thread termination and system calls, understanding the principles is sufficient

💡 Engineering Experience Summary

1. Lightweight synchronization: For inter-thread data transfer or lock-free data structures, prefer atomic + release/acquire or acq_rel, use seq_cst when necessary to provide global order guarantee.2. Heavyweight synchronization: If lock-free implementation is difficult or shared data structures are complex with low performance requirements, use std::mutex, std::condition_variable, which are simple, safe, and clear in semantics.

8. Further Reading

1. C++ Standard Draft §6.9.2 https://timsong-cpp.github.io/cppwp/n4861/intro.multithread2. cppreference: Memory Order https://en.cppreference.com/w/cpp/atomic/memory_order.html

9. Conclusion

In single-threaded contexts, the execution order of code seems simple; however, in multithreaded contexts, compiler optimizations and CPU reordering can make the order unpredictable.Happens-before is the key semantics that defines “which operation results must be visible to other threads”. Understanding happens-before allows you to truly master:

  • Why release-acquire guarantees visibility;

  • Why a simple flag must use std::atomic?

  • How to write multithreaded programs?

For concurrent developers, this is not just a theoretical concept, but the foundation for writing correct, high-performance multithreaded programs.Many can write code with locks, but only those who truly understand happens-before can write predictable, reliable, and high-performance concurrent systems.Happens-before is the physical law of the C++ concurrent world. Understanding it marks your true entry into the advanced realm of C++ engineering.📬 Welcome to follow the WeChat public account “Hankin-Liu’s Technical Research Room” for mentorship and sharing. Continuously sharing valuable technical content related to innovation, software performance testing, optimization, programming skills, and software debugging techniques.

Leave a Comment