Linux Power Management (14): Power Management from the Perspective of Device Drivers

Original:Wowotech Technology http://www.wowotech.net/pm_subsystem/pm_architecture.html

1. Introduction

It is well known among Linux driver engineers who have been working for a while:

In the past, implementing power management functions for a device was a straightforward task. Most devices were abstracted as platform devices, and the driver only needed to provide callback functions for suspend/resume/shutdown and register them with the kernel. The kernel would call the driver-provided callback functions during the system’s power state transitions to switch the device’s power state.

However, in the new era, operations related to device power management are unified and encapsulated in the struct dev_pm_ops structure. This structure contains over 20 callback functions, along with a complex power management mechanism (including conventional suspend/resume, runtime PM, etc.), making the power management work for device drivers no longer so simple, and the engineers’ thought processes are not particularly clear.

Therefore, this article aims to start from the power management of a single device, combining it with the kernel’s power management mechanism, to introduce how to add power management functions in device drivers and analyze the relationship between device power state transitions and system power state transitions.

Additionally, in our series of articles on power management, we have introduced many power management mechanisms, such as generic PM, wakeup event framework, wakelock, autosleep, runtime PM, PM domain, etc. This article also serves as a summary and consolidation of those mechanisms.

2. Function Description

The power state transitions of devices are generally consistent with the system’s power state transitions (except for runtime PM), with the following scenarios:

1) The system reboot process, including halt, power off, restart, etc. (refer to “Linux Power Management (3) – Reboot Process of Generic PM”), requires the device to enter the shutdown state to avoid unexpected occurrences.

2) The system suspend/resume process (refer to “Linux Power Management (6) – Suspend Function of Generic PM”), requires the device to also synchronize suspend/resume.

3) The system hibernation and recovery process requires the device to add power off actions based on suspend/resume.

4) The runtime PM process (refer to “Linux Power Management (11) – Function Description of Runtime PM”), requires the device to suspend or even power off when the reference count is 0, and power on and resume when the reference count is greater than 0.

In the old power management framework, the shutdown, suspend, and resume callback functions in structures such as bus, class, and device_driver could achieve all the above functions except for runtime PM. However, in the new framework, especially after the introduction of struct dev_pm_ops, the use of suspend/resume is no longer recommended.

However, for some devices, such as platform devices, if the power management requirements are not very complex, driver engineers can still use the old method to implement it, and the kernel will automatically convert it to the new method. But if there are more requirements, one must face struct dev_pm_ops. The following will explain this in detail.

3. Data Structure Review

Before we officially begin, let’s review the data structures related to device power management. Most of them have been introduced in previous articles, and this article serves as a summary.

3.1. Shutdown Callback Function and Usage

Since the reboot process is relatively independent and stable, and this process relies on the device’s .shutdown callback function, it is described separately here and will not be mentioned again later.

The shutdown callback function exists in two data structures: struct device_driver and struct bus_type, and is called during the system reboot process to turn off the device. Device drivers can implement one of them as needed. Let’s take a typical platform device as an example to illustrate this process.

1) Define a platform_driver and implement its .shutdown callback, then register it with the kernel using platform_driver_register.

   1: static void foo_shutdown(struct platform_device *pdev)   2: {   3:         ...   4: }   5: static platform_driver foo_pdrv =   6: {   7:         .shutdown = foo_shutdown,   8:         ...   9: };

2) During platform_driver_register, the shutdown function of the struct device_driver variable will be replaced with the platform device-specific shutdown function (platform_drv_shutdown), and driver_register will be called to register the device_driver with the kernel.

   1: int __platform_driver_register(struct platform_driver *drv,   2:                                 struct module *owner)   3: {   4:         …   5:    6:         if (drv->shutdown)   7:                 drv->driver.shutdown = platform_drv_shutdown;   8:    9:         return driver_register(&drv->driver);  10: }

3) During the system reboot process, the shutdown function of each device will be called. For foo_pdrv, platform_drv_shutdown will be called first, which will then call foo_shutdown.

3.2 Legacy .suspend/.resume No Longer Used

The old suspend/resume operations mainly relied on the suspend and resume callback functions in structures such as struct device_driver, struct class, and struct bus_type, and their usage is almost identical to the above .shutdown. For platform devices, it only requires defining two additional functions, as follows:

   1: static int foo_suspend(struct platform_device *pdev, pm_message_t state)   2: {   3:         ...   4: }   5:    6: static int foo_resume(struct platform_device *pdev)   7: {   8:         ...   9: }  10:   11: static void foo_shutdown(struct platform_device *pdev)  12: {  13:         ...  14: }  15:   16: static platform_driver foo_pdrv = {  17:         .suspend = foo_suspend,  18:         .resume = foo_resume,  19:         .shutdown = foo_shutdown,  20:         ...  21: };

In newer kernels, the use of these callback functions is no longer recommended, but for platform devices, if the scenario is relatively simple, the above implementation method can still be used, and platform.c will automatically convert it to the struct dev_pm_ops callbacks, as described later.

3.3 struct dev_pm_ops Structure

struct dev_pm_ops is the core data structure for device power management, used to encapsulate all operations related to device power management.

   1: struct dev_pm_ops {   2:         int (*prepare)(struct device *dev);   3:         void (*complete)(struct device *dev);   4:         int (*suspend)(struct device *dev);   5:         int (*resume)(struct device *dev);   6:         int (*freeze)(struct device *dev);   7:         int (*thaw)(struct device *dev);   8:         int (*poweroff)(struct device *dev);   9:         int (*restore)(struct device *dev);  10:         int (*suspend_late)(struct device *dev);  11:         int (*resume_early)(struct device *dev);  12:         int (*freeze_late)(struct device *dev);  13:         int (*thaw_early)(struct device *dev);  14:         int (*poweroff_late)(struct device *dev);  15:         int (*restore_early)(struct device *dev);  16:         int (*suspend_noirq)(struct device *dev);  17:         int (*resume_noirq)(struct device *dev);  18:         int (*freeze_noirq)(struct device *dev);  19:         int (*thaw_noirq)(struct device *dev);  20:         int (*poweroff_noirq)(struct device *dev);  21:         int (*restore_noirq)(struct device *dev);  22:         int (*runtime_suspend)(struct device *dev);  23:         int (*runtime_resume)(struct device *dev);  24:         int (*runtime_idle)(struct device *dev);  25: };

This structure is essentially a powerful tool, containing all necessary elements, mainly divided into several categories:

Traditional suspend’s conventional paths: prepare/complete, suspend/resume, freeze/thaw, poweroff, restore;

Traditional suspend’s special paths: early/late, noirq;

Runtime PM: suspend/resume/idle.

The tasks that various drivers need to perform are straightforward: implement these callback functions and store them in the appropriate locations. Let’s continue.

3.4 Location of struct dev_pm_ops

   1: struct device {   2:         ...   3:         struct dev_pm_domain    *pm_domain;   4:         const struct device_type *type;   5:         struct class            *class;   6:         struct bus_type *bus;   7:         struct device_driver *driver;   8:         ...   9: };  10:   11:   12:   13: struct dev_pm_domain {  14:         struct dev_pm_ops       ops;  15:         ...  16: };  17:   18: struct device_type {  19:         ...  20:         const struct dev_pm_ops *pm;  21: };  22:   23: struct class {  24:         ...  25:         const struct dev_pm_ops *pm;  26:         ...  27: };  28:   29: struct bus_type {  30:         ...  31:         const struct dev_pm_ops *pm;  32:         ...  33: };  34:   35: struct device_driver {  36:         ...  37:         const struct dev_pm_ops *pm;  38:         ...  39: };

The struct dev_pm_ops can be found in all entities related to the device model, such as struct device, struct device_type, struct class, struct bus_type, and struct device_driver.

As mentioned in previous articles, during the power management process, the kernel will call the callbacks in dev_pm_ops in the following priority order to command the device to implement the corresponding state transitions:

dev->pm_domain->ops, dev->type->pm, dev->class->pm, dev->bus->pm, dev->driver->pm.

Therefore, the tasks that device drivers need to perform are also straightforward: implement these callback functions and store them in the appropriate locations. But with so many locations, how should they be implemented? Let’s analyze further.

4. Implementation of struct dev_pm_ops

As previously described, when the system transitions power states, it will call the device’s pm ops in a certain priority order. The so-called priority order means that as long as there is a higher priority ops (such as dev->pm_domain->ops), that ops will be called; otherwise, it will continue to search for the next priority. Therefore, device drivers can implement dev pm ops at the specified level according to the actual situation of the device to achieve power management.

Dev pm ops can exist in any of the pm domain, device type, class, bus, or device driver. This chapter will take pm domain, bus, and device driver as typical scenarios to illustrate the implementation ideas of device power management.

Note 1: For convenience, I will use the .suspend function in struct dev_pm_ops as an example; the others are similar.

4.1 PM Domain

When a device belongs to a certain pm domain (refer to “Linux PM Domain Framework (1) – Overview and Usage Process”), during the system suspend process, pm_domain->ops.suspend will be called directly. As seen from pm_genpd_init, pm_domain->ops.suspend is implemented by pm_genpd_suspend:

genpd->domain.ops.suspend = pm_genpd_suspend;

The implementation of this interface is:

   1: static int pm_genpd_suspend(struct device *dev)   2: {   3:         struct generic_pm_domain *genpd;   4:    5:         dev_dbg(dev, "%s()\n", __func__);   6:    7:         genpd = dev_to_genpd(dev);   8:         if (IS_ERR(genpd))   9:                 return -EINVAL;  10:   11:         return genpd->suspend_power_off ? 0 : pm_generic_suspend(dev);  12: }

Ultimately, pm_generic_suspend will be called, which, as described in “Linux Power Management (4) – Power Management Interface”, will eventually call the suspend interface of the device driver (if it exists), i.e., dev->driver->pm->suspend.

It seems like a false hope; I thought the pm domain would help, and the device driver could take a break, but it turns out that the responsibility is passed back to the device driver! Let’s consider the reasons:

1) During suspend, what actions the device should take are best known to the device driver, so it is reasonable to leave it to the driver.

2) Then, why go through the pm domain layer? Why not just call the driver’s suspend directly? Because some processing needs to be done by the pm domain before suspend, such as checking whether the device has already lost power (if it has, it cannot be suspended again, or it may lead to unexpected results), etc.

4.2 dev->bus->pm

Let’s look at another example: if the bus to which the device belongs provides dev_pm_ops? Before we start, let’s emphasize this fact: during suspend, what actions the device should take are best known to the device driver, so it is reasonable to leave it to the driver. So, as you might guess, even if the bus has a suspend callback, it ultimately still needs to go through the device driver’s suspend interface.

We will take the platform bus as an example because it is simple, and most devices we typically deal with are platform devices.

In drivers/base/platform.c, the platform bus is defined as follows:

   1: struct bus_type platform_bus_type = {   2:         .name           = "platform",   3:         .dev_groups     = platform_dev_groups,   4:         .match          = platform_match,   5:         .uevent         = platform_uevent,   6:         .pm             = &platform_dev_pm_ops,   7: };

Next, let’s look at platform_dev_pm_ops:

   1: static const struct dev_pm_ops platform_dev_pm_ops = {   2:         .runtime_suspend = pm_generic_runtime_suspend,   3:         .runtime_resume = pm_generic_runtime_resume,   4:         USE_PLATFORM_PM_SLEEP_OPS   5: };

Oh, there are two callbacks related to runtime PM, and there is a macro definition: USE_PLATFORM_PM_SLEEP_OPS, which specifies the suspend callback of dev_pm_ops as platform_pm_suspend (and others similarly). The implementation of this interface is as follows:

   1: int platform_pm_suspend(struct device *dev)   2: {   3:         struct device_driver *drv = dev->driver;   4:         int ret = 0;   5:    6:         if (!drv)   7:                 return 0;   8:    9:         if (drv->pm) {  10:                 if (drv->pm->suspend)  11:                         ret = drv->pm->suspend(dev);  12:         } else {  13:                 ret = platform_legacy_suspend(dev, PMSG_SUSPEND);  14:         }  15:   16:         return ret;  17: }

I see, if the device’s driver provides a dev_pm_ops pointer, the corresponding suspend interface will be called. Otherwise, the legacy interface (i.e., pdrv->suspend) will be called. Comparing with the descriptions in sections 3.1 and 3.2, it becomes clear!

Additionally, since the platform bus is a virtual bus, no other actions are needed. For some physical buses, bus-related suspend operations can be implemented in the bus’s suspend interface. This is the charm of the device model.

4.3 dev->driver->pm

Regardless, if a device needs to perform some actions during suspend, it must implement suspend in the device driver. How to implement it? Define a struct dev_pm_ops variable and implement the required callback functions for the device, storing them in the driver->pm pointer before registering the driver.

What changes? Most devices are platform devices, and we can also use the old method (sections 3.1 and 3.2) to implement suspend/resume for the platform driver.However, in the new era, it is not recommended to do so. Note the legacy term in platform_legacy_suspend; it is just for compatibility. If we are writing a new driver, we should use the new method.

5. Device Power State Transition Process

I originally wanted to sort out how the driver handles the system power transition process. However, after the above analysis, the traditional suspend/resume is already quite clear; it is merely a matter of calling the corresponding callback functions in the order of pm_domain—>device driver or class—>device driver or bus—>device driver. As for runtime PM, it is better to analyze it in the runtime PM analysis article. Therefore, this article will conclude here.

Leave a Comment