https://www.zhihu.com/question/290767285/answer/1200063036
Author: Longquan Temple Sweeping Monk (Chief Browser Architecture Expert, Author of the World’s Smallest Chromium Kernel – Miniblink), authorized for reprint
Let’s take a look at the open-source Chromium; it is indeed extremely complex.
The source code alone is over ten gigabytes.
We can’t help but wonder, what exactly is in Chromium? How can something that seems to only display a webpage and a few lines of HTML require so much code?
At first glance, from the directory structure, Chromium contains the following:
base, general-purpose code, foundational components, containing a collection of utility classes like strings, files, threads, message queues, etc.
cc, short for Chromium compositor, responsible for rendering composition.
chrome, the implementation of the Chromium browser shell.
content, the core code of the multi-process sandbox browser, managing process and thread architecture.
gpu, OpenGL wrapper code, including CommandBuffer and OpenGL compatibility support, etc.
net, implementation of the network stack.
ipc, implementation of inter-process message communication.
media, multimedia wrapper code, containing components for media content capture and playback.
mojo, similar to Android’s AIDL, provides a cross-language (C++ / Java / JavaScript) and cross-platform inter-process object communication mechanism.
skia, a graphics library, where the configuration and extension code for Chromium’s use of Skia is stored, along with the native Skia code in the third_party/skia directory.
third_party, web typography engine. Third-party libraries.
ui, UI framework.
v8, V8 JavaScript engine library.
It seems manageable, right? But in reality, each of these could fill a thick technical manual.
For example, the net module appears to be just a network library, yet it contains host resolution, cookies, network change detection, SSL, resource caching, FTP, HTTP, OCSP implementation, proxy (SOCKS and HTTP) configuration, parsing, script fetching (including implementations across different systems), QUIC, socket pools, SPDY, WebSockets… Each item could be a book on its own.
The V8 layer seems to have a simple function of just implementing JS, but it includes bytecode parsers, JIT compilers, generational GC, inspectors (debugging support), memory and CPU profilers (performance statistics), WebAssembly support, two types of post-mortem diagnostics, startup snapshots, code caching, and code hotspot analysis… Each item could also be a book, often delving into the complex realms of compilation principles and optimization directions.
Skia seems just to be a graphics library, used to draw various graphics. However, it includes dozens of vector drawing methods, text rendering, GPU acceleration, vector instruction recording and playback (which also needs to support thread safety), codec for various image formats, and PDF generation (a deeply hidden but interesting feature; Skia supports rendering vector graphics to PDF).
Additionally, it’s worth mentioning that Skia was acquired by Google. It’s unclear whether Google felt they couldn’t handle it or if it was too labor-intensive. In any case, Google chose to buy someone else’s code to accomplish these functions.
The UI framework appears to be just a UI library. However, Chromium needs a UI library that adapts to all platforms and supports GPU acceleration. Unfortunately, it lacks the implementation of rich edit. The design of the UI library is so in-depth that it could be considered another browser.
Wait a moment, all of the above seems to be just the outer layer of the browser. What about the web layout? Isn’t that the core of the browser?
Indeed, intriguingly, Chromium places its layout engine Blink under third_party and treats its architecture as if it were a third-party library.
According to Google employees, this is due to historical reasons… okay, let’s believe that for now. However, this third-party library has become undoubtedly the most complex and functionally critical third-party library.
The work of Blink includes:
-
Implementing web platform specifications (e.g., HTML standards), including DOM, CSS, and Web IDL.
-
Cooperating with V8 to run JavaScript.
-
Requesting resources from the underlying network stack.
-
Building the DOM tree.
-
Calculating styles and layouts.
-
Requesting the Chrome compositor (the cc layer mentioned above) and rendering graphics.
It sounds simple. Look at the current HTML and CSS specifications; the various details add up to nearly ten thousand pages.
Aside from the Chromium layout group and Firefox developers, I doubt many people would read and implement these specifications in detail. Just looking at the directory and textual descriptions is overwhelming, let alone implementing them completely.
Often, a simple display: grid/flex behind it involves a vast and complex calculation, and performance optimization must also be fully considered, particularly how to display more quickly during scrolling…
Additionally, layout needs to support various strange characters from around the world. For example, right-to-left writing, and the incredibly complex rules of Arabic. In contrast, Chinese characters, being square, are relatively simple to lay out. There are also various strange Unicode characters.
How can one handle these characters and languages while correctly displaying them according to thousands of pages of HTML and CSS layout rules… It is an extremely brain-burning task.
Now, let’s jump out of the layout quagmire and look at other things outside. At this time, you will find… the outside quagmire seems even larger.
Just to name a few:
Multi-process framework. Yes, you need more processes to render more web pages, so that if one crashes, it doesn’t affect others.
Note that Chromium places rendering and layout in the rendering process, but drawing to the window is in the main process. This involves various inter-process communications and synchronization. Writing and debugging the code is a significant test of programming skills.
WebRTC. Related to network video. Another acquired library. Regarding WebRTC, you need to know it can implement multi-person real-time voice, noise reduction, network video transmission, camera capture, audio algorithms (like FFT), video algorithms (like H264 protocol format), and basic libraries (like sockets, threads, locks). It’s another vast component.
Password management, download management, extension management.
A complete scheduling system for the multi-process framework and the core layer of Blink. In Chromium, this is referred to as the content layer, responsible for handling all the tedious details, such as dispatching mouse and keyboard messages across various systems and platforms, history stack (forward and backward), page caching.
Sandbox mechanism. Responsible for isolating and reducing the permissions of subprocesses. The implementation of the sandbox involves numerous hook operations across different systems.
Chrome-related shell and applications. For example, the common title bar, URL bar, web UI like settings page, history page. Yes, the original meaning of the word Chrome is this.
Cloud_Print, related to Google Cloud Print, providing Google Chrome page preview print list.
Courgetter, Google’s binary file comparison core algorithm, used to compare binary differences between different versions. Google created a set of upgrade strategies and algorithms for convenience.
Magical Syzygy optimization. Yes, Google also finds Chrome too large and slow to load. So they developed a toolchain to optimize the reorganization of PE binary files to achieve program optimization. After applying Syzygy optimization, the cold start page scheduling (paging traffic) of the Chrome browser was improved by 80%, and the loading of images’ working set was optimized by 40%. In simple terms, Google started tweaking the compiler on exe and dll to optimize startup performance.
Media, Chrome’s multimedia module, supporting audio playback and recording functions. This uses FFmpeg. But outside of FFmpeg, to work with Blink, it is wrapped in a thick layer to handle the rendering pipeline. Additionally, the MSE API took a lot of effort.
SwiftShader. An interesting module that fully implements the OpenGL interface using pure software code. It can run OpenGL on machines without hardware acceleration. It’s also a massive library and has been acquired. It seems Google is not adept at graphics engineering? Or they prefer to delegate it to more specialized teams.
GN, GYP, Ninja. To facilitate easier management of compilation, Chromium created three sets of tools similar to Makefile and CMake, then calls Ninja at the bottom level to compile other points with VS or Clang. There are many more details to add later.
In summary, any one of the points mentioned above requires the workload of a team to implement correctly, and each could be written as a book. Yet Chromium has implemented all of them and continues to add new features.
At this point, everyone should understand why even a strong company like Microsoft has given up maintaining its browser kernel. The manpower and financial investment required is just too terrifying. The Chromium team has thousands of developers alone.
If each person has an annual salary of 1 million RMB, sustained investment over ten years would amount to tens of billions, not counting testing, product, and UI.
The most crucial point is, even if Microsoft is willing to invest a billion, can they guarantee achieving the same functionality as Chromium? Even if they could achieve the same functionality, wouldn’t it be another version of Chromium? Could they create other advantages?
Ultimately, Microsoft also gave up and decided to directly modify the open-source Chromium, integrating the features they needed.
Thus, the hegemony of Chromium came to be. It appears open-source and free, yet it tightly binds all developers and competitors around itself.
Now, let the complaints begin.
Since Chromium’s dominance in the browser world, it seems open-source, and anyone can modify it. However, compared to its predecessors, I feel that Chromium is much