Originally from Yuedu Code Field

Let heaven return to heaven, let dust return to dust
—— Discussing Linux’s bus, device, and driver model
Author:Song Baohua
On May 15, 1951, during a congressional hearing, U.S. Army five-star General MacArthur suggested expanding the Korean War into China. Bradley then remarked, “If we expand the war into Communist China, we will be caught in a war at the wrong time, in the wrong place, against the wrong enemy.”
Writing code follows the same principle: putting the correct code in the correct place, not the opposite. The same code can appear in multiple possible locations, but where it should appear is a result of software architecture design, in short, it’s all about high cohesion and low coupling.
In a dilemma
Now let’s imagine a simple network card named ABC, which needs to connect to a CPU (let’s assume CPU X) on its memory bus, requiring address, data, and control buses (as well as interrupt pins, etc.).
In the ABC network card driver, we need to define the base address, interrupt number, and other information for ABC. Let’s assume on CPU X’s circuit board, ABC’s address is 0x100000, and the interrupt number is 10. Suppose we define the macros like this:
#define ABC_BASE 0x100000
#define ABC_IRQ 10
And we write code to complete sending messages and initializing interrupt requests:
#define ABC_BASE 0x100000
#define ABC_IRQ 10
int abc_send(…)
{
writel(ABC_BASE + REG_X, 1);
writel(ABC_BASE + REG_Y, 0x3);
…
}
int abc_init(…)
{
request_irq(ABC_IRQ,…);
}
The problem with this code is that once the board changes, ABC_BASE and ABC_IRQ will no longer be the same, and the code needs to change accordingly.
Some programmers say I can do this:
#ifdef BOARD_A
#define ABC_BASE 0x100000
#define ABC_IRQ 10
#elif defined(BOARD_B)
#define ABC_BASE 0x110000
#define ABC_IRQ 20
#elif defined(BOARD_C)
#define ABC_BASE 0x120000
#define ABC_IRQ 10
…
#endif
While this is possible, if you have 10,000 different boards, you would have to #ifdef 10,000 times. Writing code like this feels like building a wall 「it feels like writing code is akin to laying bricks, simply repetitive and mechanical, which is dangerous as it may introduce bad ‘smells’ in the code」. Considering Linux’s adaptation to various products globally, no one can truly say how many boards use ABC.
So, does doing #ifdef a thousand times really solve the problem?
It really doesn’t. Suppose there is a circuit board with two ABC network cards, it would be completely baffling. How would you define it?
#ifdef BOARD_A
#define ABC1_BASE 0x100000
#define ABC1_IRQ 10
#define ABC2_BASE 0x101000
#define ABC2_IRQ 11
#elif defined(BOARD_B)
#define ABC1_BASE 0x110000
#define ABC1_IRQ 20
…
#endif
If you do this, how would abc_send() and abc_init() change? Would it be like this:
int abc1_send(…)
{
writel(ABC1_BASE + REG_X, 1);
writel(ABC1_BASE + REG_Y, 0x3);
…
}
int abc1_init(…)
{
request_irq(ABC1_IRQ,…);
}
int abc2_send(…)
{
writel(ABC2_BASE + REG_X, 1);
writel(ABC2_BASE + REG_Y, 0x3);
…
}
int abc2_init(…)
{
request_irq(ABC2_IRQ,…);
}
…
Or like this?
int abc_send(int id, …)
{
if (id == 0) {
writel(ABC1_BASE + REG_X, 1);
writel(ABC1_BASE + REG_Y, 0x3);
} else if (id == 1) {
writel(ABC2_BASE + REG_X, 1);
writel(ABC2_BASE + REG_Y, 0x3);
}
…
}
Regardless of how you change it, this code is simply unbearable to look at. Why do we fall into such a predicament? Because we made the fatal mistake of 「placing the correct code in the wrong position」, which introduced significant coupling.
Reflection on the detour
The fatal error we made was coupling board-level interconnection information into the driver code, resulting in the driver being unable to be cross-platform.
If we think about it, the real responsibility of the ABC driver is to complete the send and receive process of the ABC network card. Does this process really have anything to do with what CPU it connects to (TI, Samsung, Broad, Allwinner, etc.) or which board it connects to?
The answer is no! The ABC network card will not change regardless of whether you are using TI’s ARM, Loongson, or Blackfin. No matter how chaotic the external boards are, the ABC itself remains unchanged.
Since it has nothing to do with it, why should this board-level interconnection information be placed in the driver code? Essentially, we can consider that ABC will not change due to external factors, so its code should be inherently cross-platform. Therefore, we believe that 「#defineABC_BASE 0x100000, #define ABC_IRQ 10」 type of code appearing in the driver is 「fighting a wrong war in the wrong place against the wrong enemy」. It has not been placed in the correct position, and when we write code, we must 「let heaven return to heaven, let dust return to dust」. Our true expectation is probably like this:
Software engineering emphasizes high cohesion and low coupling. The tighter the connections between elements within a module, the higher its cohesion; the less tightly connected modules are, the lower their coupling. Thus, high cohesion and low coupling emphasize that internal elements should tightly group together while external elements should stay away. For drivers, board-level interconnection information clearly belongs to the latter category.
Once, I asked engineers in a German company 「what is the relationship between high cohesion and low coupling?」 One engineer eagerly replied, 「High cohesion and low coupling are a pair of contradictions」. I thought his mind was confused. If one must describe the relationship between high cohesion and low coupling, I believe they conform to Marxism-Leninism, Mao Zedong Thought, emphasizing 「high cohesion and low coupling are interdependent, indispensable, mutually reinforcing, and promote each other」. It actually reflects two different aspects of the same thing; in short, just reciting the political textbook will do. When you write serial port code, everything in it is related to the serial port, tightly grouped, and it naturally won’t wander into SPI to couple. SPI must be low-coupled with the serial port, which requires UART internal code to group all serial port elements together, without wandering around, and without a SPI residence permit, don’t even think about getting a household registration.
Bright flowers on the willow bank
Now that board-level interconnection information has been separated from the driver, they still have some connection because the driver ultimately needs to retrieve base addresses, interrupt numbers, and other board-level information. How to retrieve this is a big problem.
One method is for the ABC driver to inquire all over the world, 「What is your base address and interrupt number?」 This still leads to serious coupling. Because the driver still needs to know if ABC is on the board, which board has it, and how it is connected. It is still directly coupled with the board.
Could there be another way? We maintain a common database-like structure that uniformly maintains what network cards are on each board, their base addresses, and interrupt numbers. Then, the driver can ask a unified place through a standardized API to obtain this information?
Based on this idea, Linux divides device drivers into three entities: bus, device, and driver. The bus serves as the unified link, while the device contains the board-level interconnection information. The responsibilities of these three entities are as follows:
Entity |
Function |
Code |
Device |
Describes base address, interrupt number, clock, DMA, reset, etc. |
arch/arm arch/blackfin arch/xxx etc. directories |
Driver |
Completes the functions of peripherals, such as sending and receiving packets for network cards, recording and playing for sound cards, reading and writing for SD cards… |
drivers/net sound drivers/mmc etc. directories |
Bus |
Associates devices and drivers |
drivers/base/platform.c drivers/pci/pci-driver.c … |
We fill in all the board interconnection information in the device side, then let the device register with the bus to inform it of its existence. Naturally, the bus associates these devices and further indirectly relates the board-level connection information of the devices. For example, the board arch/blackfin/mach-bf533/boards/ip0x.c has two DM9000 network cards, which register as follows:
static struct resource dm9000_resource1[] = {
{
.start = 0x20100000,
.end = 0x20100000 + 1,
.flags = IORESOURCE_MEM
},{
.start = 0x20100000 + 2,
.end = 0x20100000 + 3,
.flags = IORESOURCE_MEM
},{
.start = IRQ_PF15,
.end = IRQ_PF15,
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHEDGE
}
};
static struct resource dm9000_resource2[] = {
{
.start = 0x20200000,
.end = 0x20200000 + 1,
.flags = IORESOURCE_MEM
}…
};
…
static struct platform_device dm9000_device1 = {
.name = “dm9000”,
.id = 0,
.num_resources = ARRAY_SIZE(dm9000_resource1),
.resource = dm9000_resource1,
};
…
static struct platform_device dm9000_device2 = {
.name = “dm9000”,
.id = 1,
.num_resources = ARRAY_SIZE(dm9000_resource2),
.resource = dm9000_resource2,
};
static struct platform_device *ip0x_devices[] __initdata = {
&dm9000_device1,
&dm9000_device2,
…
};
static int __init ip0x_init(void)
{
platform_add_devices(ip0x_devices, ARRAY_SIZE(ip0x_devices));
…
}
Thus, the platform bus naturally knows there are two DM9000 network cards on the board. Once the DM9000 driver is registered as well, since the platform bus has already associated the devices, the driver can naturally obtain the memory base address and interrupt information based on the existing DM9000 device information:
static struct resource dm9000_resource1[] = {
{
.start = 0x20100000,
.end = 0x20100000 + 1,
.flags = IORESOURCE_MEM
},{
.start = 0x20100000 + 2,
.end = 0x20100000 + 3,
.flags = IORESOURCE_MEM
},{
.start = IRQ_PF15,
.end = IRQ_PF15,
.flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHEDGE
}
};
The purpose of the bus is to match these drivers and devices one by one. As shown in the figure, a certain circuit board has two ABC devices, one DEF device, and one HIJ device, along with respective ABC, DEF, and HIJ drivers. The bus thus matches the two ABC devices with one ABC driver, one-to-one matching for DEF devices and drivers, and one-to-one matching for HIJ devices and drivers.
The driver itself can use the simplest API to obtain the interconnection information filled in by the device side, as seen in the dm9000_probe() code of drivers/net/ethernet/davicom/dm9000.c:
static int dm9000_probe(struct platform_device *pdev)
{
…
db->addr_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
db->data_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
db->irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
…
}
Thus, board-level interconnection information will no longer intrude into the driver, and the driver appears to have no direct coupling with the device since it calls only bus-level standard APIs: platform_get_resource(). The bus contains a match() function to determine which device is served by which driver. For example, for the platform bus hanging in memory, its matching is similar (the simplest matching method is that the device and driver name fields are the same):
static int platform_match(struct device *dev, struct device_driver *drv)
{
struct platform_device *pdev = to_platform_device(dev);
struct platform_driver *pdrv = to_platform_driver(drv);
/* When driver_override is set, only bind to the matching driver */
if (pdev->driver_override)
return !strcmp(pdev->driver_override, drv->name);
/* Attempt an OF style match first */
if (of_driver_match_device(dev, drv))
return 1;
/* Then try ACPI style match */
if (acpi_driver_match_device(dev, drv))
return 1;
/* Then try to match against the id table */
if (pdrv->id_table)
return platform_match_id(pdrv->id_table, pdev) != NULL;
/* fall-back to driver name match */
return (strcmp(pdev->name, drv->name) == 0);
}
VxBus is Wind River’s new device driver architecture, which has been added to VxWorks since version 6.2 and has been largely VxBus-compliant by version 6.9. However, this VxBus can be said to be very similar to Linux’s bus, device, and driver model. But why is it called VxBus? Is it very Vx?
Thus, the code we see will be like this: regardless of which board’s ABC device, it uniformly uses an unchanging drivers/net/ethernet/abc.c driver, while arch/arm/mach-yyy/board-a.c type of code has many copies.
Reaching a higher level
We still see a large amount of arch/arm/mach-yyy/board-a.c type of code, rushing to describe the details of board-level information, despite having been decoupled from the driver. The existence of this code is simply a pollution of the Linux kernel and a ruthless contempt for Linus Torvalds, as it lacks technical content.
We have reason to describe this device-side information using a non-C script language. This script file is what is known as the Device Tree.
The Device Tree is a dts file that describes all devices on each board and their connection information in the simplest syntax. For example, the DM9000 under arch/arm/boot/dts/imx1-apf9328.dts is such a script, where the base address and interrupt number become attributes of the DM9000 device node:
eth: eth@4,c00000 {
compatible = “davicom,dm9000”;
reg = <
4 0x00c00000 0x2
4 0x00c00002 0x2
>;
interrupt-parent = <&gpio2>;
interrupts = <14 IRQ_TYPE_LEVEL_LOW>;
…
};
Afterwards, C code is removed, and files like arch/arm/mach-xxx/board-a.c are permanently relegated to the historical archive. The code then takes on this architecture: change a board, just change the Device Tree. 「Let heaven return to heaven, let dust return to dust」, let the driver’s code return to C, and let the device’s information return to the device tree script.
We are both pleased and saddened to see that VxWorks 7 has also adopted the Device Tree in its new version. We are pleased that it has finally arrived; we are saddened that it has come too late. The wheels of Linux roll forward relentlessly, crushing everything in its path. The trajectory of humanity over the millennia, the changes of the sea and the mulberry fields, the rotation of the stars, is a repetitive process of history returning to history, and the future returning to history. This is the tragedy of reality and the grandeur of history.
As the Art of War by Sun Tzu states: “Water flows according to the terrain, and military victory is determined by the enemy. Hence, there is no constant form in military strategy, just as water has no constant shape; those who can adapt to changes and achieve victory are called divine.” Everything is simply about following the trend and placing the correct code in the correct position.

Long pressto recognize the QR code in the imageto follow