Hello everyone, I am Xiao Feng Ge.
When learning C language, there is a strange keyword volatile, what is its use?
volatile and the Compiler
First, let’s look at this piece of code:
int busy = 1;
void wait() { while(busy) { ; }}
Compile it, and note that O2 optimization is used here:
Let’s take a closer look at the generated assembly:
wait: mov eax, DWORD PTR busy[rip].L2: test eax, eax jne .L2 ret
busy: .long 1
In this, the L2 part corresponds to the while loop. This instruction is optimized by the compiler, and we can see that whether to exit the loop is determined by checking the register eax, rather than checking the actual content of the variable busy in memory.Note that for this piece of code, this optimization is correct, but the problem is that if there are other codes modifying the variable busy, then this optimization will cause the modifications made by other codes to the variable busy to be ineffective, like this:
int busy = 1; // This function runs in thread A
void wait() { while(busy) { ; }}
// This function runs in thread B
void signal() { busy = 0;}
If the machine instructions corresponding to the while loop in the wait function only read data from the register, then even if the signal function in thread B modifies the variable busy, it cannot make the wait function exit the loop.If you use volatile to modify the variable busy, the generated instructions will change to this:
wait:.L2: mov eax, DWORD PTR busy[rip] test eax, eax jne .L2 ret
busy: .long 1
Notice that at this time, the L2 part reads data from the memory where the variable busy is located and stores it in eax every time, and then checks it, ensuring that the latest value of the variable busy is read each time.In fact, you can think of the register eax as a cache for the memory where busy is located. When the cache (register) and the data in memory are consistent, there will be no problem, but when the cache and the data in memory are inconsistent (that is, the memory has been updated but the cache still holds the old data), the program’s operation is often unexpected.In addition to the multithreading example, there is another case where a signal handler or hardware modifies the variable (which is often encountered when interacting with hardware in C language). If the compiler generates instructions like those at the beginning of the article, then the waiting thread will not detect the modifications made by the signal handler or hardware to the variable.
Therefore, we need to tell the compiler: “Do not be clever, do not just read data from the register, this variable may have been modified elsewhere, use the latest data from memory when accessing it.” Now it’s time to summarize briefly, volatile only prevents the compiler from trying to optimize the reading operation of the variable.
volatile and Multithreading
It is important to note that volatile only ensures the visibility of the variable, but has nothing to do with atomic access to the variable; these are two completely different tasks.Suppose there is a very complex structure struct foo:
struct data { int a; int b; int c; ...};
volatile struct data foo;
void thread1() { foo.a = 1; foo.b = 2; foo.c = 3; ...}
void thread2() { int a = foo.a; int b = foo.b; int c = foo.c; ...}
Using volatile to modify the variable foo only ensures that when this variable is modified by thread1, we can read the latest value in thread2, but this does not solve the problem of atomic access to foo required for concurrent read and write in multithreading.To ensure atomic access to a variable, locks are generally used. When using locks, the lock itself contains the capabilities provided by volatile, that is, ensuring the visibility of the variable. Therefore, when using locks, there is no need to use volatile.
volatile and Memory Order
Some may wonder if I want to use a volatile modified variable that is not so complex, just an int, like this:
volatile int busy = 0;
Thread A reads the busy variable, and thread B updates the busy variable. When thread A detects a change in busy, it performs a specific operation. Is this feasible? Since using volatile ensures that it always reads from memory, it should be usable this way.However, while the concept of computers may be relatively simple, the engineering practice is complex.We know that due to the significant speed difference between the CPU and memory, there is a layer of cache between the CPU and memory. The CPU does not directly read memory; the existence of cache complicates the problem. Due to space limitations and the theme of this article, I will not elaborate further.To optimize memory read and write, the CPU may reorder memory read and write operations, leading to the consequence that: suppose in thread 1, lines N and N+1 of code are executed in order, but in thread 2, line N+1 appears to take effect first. Suppose the initial value of X is 0, and Y is 1:
Thread 1 Thread 2
X = 10 if (!busy)
busy = 0; Y = X;
When thread 2 detects that busy is 0 and reads the value of X, the value read may be 0.To solve this problem, we need not volatile; volatile cannot solve the reordering problem. What we need is a memory barrier.A memory barrier is a type of machine instruction that restricts the processor’s memory operations before and after the barrier instruction, ensuring that reordering does not occur.The effect of a memory barrier still encompasses the functionality provided by volatile, so volatile is also unnecessary.It can be seen that in a multithreading environment, we almost never use the volatile keyword.
I am Xiao Feng Ge, the author of the bestselling book Secrets of Computer Fundamentals. I recommend my column In-depth Understanding of Operating Systems, which has been newly upgraded in the second edition, with over 600 exquisite hand-drawn illustrations, 87 systematic lessons, and a dedicated discussion group (a group of people can go further). If the operating system has always been a black box for you, then this column is prepared for you:
