The Linux kernel is a large C codebase composed of many different subsystems. Each subsystem has its own purpose and operates independently. However, there are cases where one subsystem needs to be aware of information from other subsystems. The Linux kernel has a special mechanism to address this issue. This mechanism is called the notification chain, and its primary purpose is to provide a way for subsystems to subscribe to asynchronous events from other subsystems. It is important to note that this mechanism is only applicable for internal communication within the kernel.
Before discussing the notification chain API and its implementation, let us first understand the notification chain mechanism from a theoretical perspective. All content related to the notification chain mechanism is located in the include/linux/notifier.h header file and the kernel/notifier.c source code file.
Data Structures Related to Notification Chains
Let us start by considering the data structures related to the notification chain mechanism. As mentioned above, the main data structures are located in the include/linux/notifier.h header file, which provides a generic API that is architecture-independent. Generally speaking, the notification chain mechanism represents a linked list of callback functions (hence its name as a chain), which will be executed when an event occurs.
All these callback functions in the Linux kernel are represented as the type notifier_fn_t:
typedef int (*notifier_fn_t)(struct notifier_block *nb, unsigned long action, void *data);
We can see that it requires the following three parameters:
nb– Pointer to the notification chain data structure corresponding to this callback function;-
<span>action</span>– Type of the event. A notification chain may support multiple events, so we need this parameter for differentiation; -
<span>data</span>– Allows additional data information about the event to be provided.
Additionally, we see that notifier_fn_t returns an integer value. This integer value can be one of the following:
-
<span>NOTIFY_DONE</span>– The subscriber is not interested in the notification; -
<span>NOTIFY_OK</span>– The notification has been processed correctly; -
<span>NOTIFY_BAD</span>– There was a problem with the notification; -
<span>NOTIFY_STOP</span>– The notification has completed, and no further callbacks for this event will be called.
All of these are defined as macros in the include/linux/notifier.h header file:
#define NOTIFY_DONE 0x0000
#define NOTIFY_OK 0x0001
#define NOTIFY_BAD (NOTIFY_STOP_MASK|0x0002)
#define NOTIFY_STOP (NOTIFY_OK|NOTIFY_STOP_MASK)
#define NOTIFY_STOP_MASK 0x8000
Subsystems that wish to receive notifications for specific events should provide their own notifier_fn_t callback function. The main role of the notification chain mechanism is to call the specified callback function when asynchronous events occur.
The main data structure of the notification chain mechanism is notifier_block:
struct notifier_block {
notifier_fn_t notifier_call;
struct notifier_block __rcu *next;
int priority;
};
It is defined in the include/linux/notifier.h file. This structure contains a pointer to the callback function notifier_call, a pointer to the next notification chain structure, and the priority of the callback function, with higher priority functions executed first.
The Linux kernel provides the following four types of notification chains:
-
Blocking notification chain;
-
SRCU notification chain;
-
Atomic notification chain;
-
Raw notification chain.
For blocking notification chains, callbacks are called/executed in the process context. This means that calls within the notification chain may be blocked.
The SRCU notification chain is another form of blocking notification chain. Blocking notification chains use rw_semaphore synchronization primitives to protect the notification chain. SRCU notification chains also run in the process context but use a special form of RCU mechanism that allows blocking in the read-side critical section.
The atomic notification chain runs in interrupt or atomic context and is protected by spinlock synchronization primitives. The raw notification chain provides a special type of notification chain that has no protection mechanism for callbacks. This means that the protection work is handled by the caller. The raw notification chain is very useful when we want to use a very special mechanism to protect our notification chain.
When we look at the implementation of the notifier_block structure, we find that it contains a pointer to the next element in the notification chain list but does not have a head pointer. In fact, the head of such a linked list is located in a separate structure, depending on the type of notification chain. For example, for a blocking notification chain:
struct blocking_notifier_head {
struct rw_semaphore rwsem;
struct notifier_block __rcu *head;
};
For the atomic notification chain:
struct atomic_notifier_head {
spinlock_t lock;
struct notifier_block __rcu *head;
};
Now that we have some understanding of the notification chain mechanism, let us learn about its API implementation.
Typically, a publish/subscribe mechanism involves two parties. One party wishes to receive notifications, while the other generates these notifications. We will explore the notification chain mechanism from both perspectives. We will only consider the blocking notification chain, as other types of notification chains are similar, with the main difference being the protection mechanism.
Before the producer can generate notifications, the head of the notification chain should first be initialized. For example, let us consider the notification chain related to loadable kernel modules. Looking at the kernel/module.c source code file, we will see the following definition:
static BLOCKING_NOTIFIER_HEAD(module_notify_list);
This defines the head of the blocking notification chain for loadable modules. The BLOCKING_NOTIFIER_HEAD macro is defined in the include/linux/notifier.h header file and expands to the following code:
#define BLOCKING_INIT_NOTIFIER_HEAD(name) do {
init_rwsem(&(name)->rwsem);
(name)->head = NULL;
} while (0)
It initializes the read/write semaphore and sets the head to NULL. In addition to the BLOCKING_INIT_NOTIFIER_HEAD macro, the Linux kernel also provides ATOMIC_INIT_NOTIFIER_HEAD, RAW_INIT_NOTIFIER_HEAD macros, and srcu_init_notifier function to initialize other types of notification chains.
After initializing the head of the notification chain, subsystems that wish to receive notifications from the given notification chain should register specific functions based on the type of notification chain. Looking at the include/linux/notifier.h header file, you will see the following four functions:
extern int atomic_notifier_chain_register(struct atomic_notifier_head *nh, struct notifier_block *nb);
extern int blocking_notifier_chain_register(struct blocking_notifier_head *nh, struct notifier_block *nb);
extern int raw_notifier_chain_register(struct raw_notifier_head *nh, struct notifier_block *nb);
extern int srcu_notifier_chain_register(struct srcu_notifier_head *nh, struct notifier_block *nb);
As mentioned above, we will only discuss the blocking notification chain in this section, so let us look at the implementation of the blocking_notifier_chain_register function. The implementation of this function is located in the kernel/notifier.c source code file, and we can see that the blocking_notifier_chain_register function takes two parameters:
-
<span>nh</span>– Head of the notification chain; -
<span>nb</span>– Notification descriptor.
Now let us look at the implementation of the blocking_notifier_chain_register function:
int blocking_notifier_chain_register(struct blocking_notifier_head *nh, struct notifier_block *n){
int ret;
if (unlikely(system_state == SYSTEM_BOOTING))
return notifier_chain_register(&nh->head, n);
down_write(&nh->rwsem);
ret = notifier_chain_register(&nh->head, n);
up_write(&nh->rwsem);
return ret;
}
We can see that the implementation of blocking_notifier_chain_register is quite simple. It first checks the current system state; if the system is in the booting state, we simply call notifier_chain_register. Otherwise, this call is protected by the read/write semaphore. Now let us look at the implementation of the notifier_chain_register function:
static int notifier_chain_register(struct notifier_block **nl, struct notifier_block *n){
while ((*nl) != NULL) {
if (n->priority > (*nl)->priority)
break;
nl = &((*nl)->next);
}
n->next = *nl;
rcu_assign_pointer(*nl, n);
return 0;
}
This function simply inserts the new notifier_block (provided by the subsystem that wishes to receive notifications) into the notification chain list. In addition to subscribing to events, the following functions can be used to unsubscribe from certain events:
extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh, struct notifier_block *nb);
extern int blocking_notifier_chain_unregister(struct blocking_notifier_head *nh, struct notifier_block *nb);
extern int raw_notifier_chain_unregister(struct raw_notifier_head *nh, struct notifier_block *nb);
extern int srcu_notifier_chain_unregister(struct srcu_notifier_head *nh, struct notifier_block *nb);
When the producer wants to notify subscribers of an event, it will call the *.notifier_call_chain function. You may have guessed it, each type of notification chain provides its own function to generate notifications:
extern int atomic_notifier_call_chain(struct atomic_notifier_head *nh, unsigned long val, void *v);
extern int blocking_notifier_call_chain(struct blocking_notifier_head *nh, unsigned long val, void *v);
extern int raw_notifier_call_chain(struct raw_notifier_head *nh, unsigned long val, void *v);
extern int srcu_notifier_call_chain(struct srcu_notifier_head *nh, unsigned long val, void *v);
Let us consider the implementation of the blocking_notifier_call_chain function. This function is defined in the kernel/notifier.c source file:
int blocking_notifier_call_chain(struct blocking_notifier_head *nh, unsigned long val, void *v){
return __blocking_notifier_call_chain(nh, val, v, -1, NULL);
}
We see that it simply returns the result of the __blocking_notifier_call_chain function. The blocking_notifier_call_chain accepts three parameters:
-
<span>nh</span>– Head of the notification chain list; -
<span>val</span>– Type of notification; -
<span>v</span>– Input parameters that the handler can use.
However, the __blocking_notifier_call_chain function requires five parameters:
int __blocking_notifier_call_chain(struct blocking_notifier_head *nh, unsigned long val, void *v, int nr_to_call, int *nr_calls){
int ret = NOTIFY_DONE;
if (rcu_access_pointer(nh->head)) {
down_read(&nh->rwsem);
ret = notifier_call_chain(&nh->head, val, v, nr_to_call, nr_calls);
up_read(&nh->rwsem);
}
return ret;
}
Where nr_to_call and nr_calls are the number of notification functions to call and the number of notifications sent, respectively. The implementation of __blocking_notifier_call_chain is quite simple; it just calls the notifier_call_chain function under the protection of the read/write semaphore and returns its result.
In this case, all the work is done by the notifier_call_chain function. The main purpose of this function is to notify registered notifiers about asynchronous events:
static int notifier_call_chain(struct notifier_block **nl, unsigned long val, void *v, int nr_to_call, int *nr_calls){
...
...
...
ret = nb->notifier_call(nb, val, v);
...
...
...
return ret;
}
Now let us consider a simple example related to loadable modules. The definition of module_notify_list in the aforementioned kernel/module.c source file determines the head of the blocking notification chain related to kernel modules, which has at least the following three events:
-
MODULE_STATE_LIVE
-
MODULE_STATE_COMING
-
MODULE_STATE_GOING
You may be interested in some subsystems of the Linux kernel. For example, tracking the state of kernel modules. Most notification chains do not directly call functions like atomic_notifier_chain_register, blocking_notifier_chain_register, etc., but rather come with a set of wrappers for registration. The registration of these module events is precisely done through these wrappers:
int register_module_notifier(struct notifier_block *nb){
return blocking_notifier_chain_register(&module_notify_list, nb);
}
Looking at the kernel/tracepoint.c source file, we will see such registration during the initialization of tracepoints:
static __init int init_tracepoints(void){
int ret;
ret = register_module_notifier(&tracepoint_module_nb);
if (ret)
pr_warn("Failed to register tracepoint module enter notifier\n");
return ret;
}
Where tracepoint_module_nb provides the callback function:
static struct notifier_block tracepoint_module_nb = {
.notifier_call = tracepoint_module_notify,
.priority = 0,
};
When one of the events, such as MODULE_STATE_LIVE, MODULE_STATE_COMING, or MODULE_STATE_GOING occurs, for example, MODULE_STATE_LIVE and MODULE_STATE_COMING notifications will be sent during the execution of the init_module system call, while MODULE_STATE_GOING will be sent during the execution of the delete_module system call:
SYSCALL_DEFINE2(delete_module, const char __user *, name_user, unsigned int, flags){
...
...
...
blocking_notifier_call_chain(&module_notify_list, MODULE_STATE_GOING, mod);
...
...
...
}
Thus, when one of these system calls is invoked from user space, the Linux kernel will send certain notifications based on the system call and will invoke the tracepoint_module_notify callback function.