Understanding the Design and Manufacturing Process of HiSilicon CPUs: A Look at Huawei's Challenges

★

This article, originally published by a Huawei employee (Baidu Tieba: Free Galloping Horse) in 2012, has been over 3 years since then. At that time, the latest model of HiSilicon CPU was listed as 950, and technology has developed significantly. However, the description of HiSilicon CPU remains unchanged.

The following content is directly adapted from an article organized and edited by 21IC, and I would like to thank them here!

★

Seeing many experts and novices arguing heatedly over HiSilicon’s quad-core chips, some praising it to the skies while others claiming it lacks any technical content, I felt heartbroken. I believe many of the points raised are incorrect. Therefore, I decided to open this thread to objectively introduce the chip design and manufacturing process. As for whether HiSilicon’s quad-core chip has technical content, I will leave it to the readers to evaluate.

Before showing off, let me introduce myself and clarify that I am a new employee at HiSilicon, but I do not engage in chip design. I have recently attended a training on chips and have been curious about how chips are realized, so I have gathered some unofficial and unscientific materials to share my humble opinions.

1. The process technology is not always better when it is smaller.

Okay, without further ado, let’s talk about some aspects of chips that I am interested in, which may not involve HiSilicon much. People often argue about 40nm, 28nm, and 14nm technologies, but what do these nm values refer to?

They refer to the size of the MOSFETs on the silicon wafer. MOSFETs are transistors, the smallest unit that makes up a chip. A NAND gate requires four MOSFETs, and generally, an ARM quad-core chip has about 500 million MOSFETs. The first computer in the world used vacuum tubes, which function similarly to MOSFETs, but the size of a vacuum tube is as large as two thumbs, while the most advanced MOSFETs etched today are only 7nm in size.

At this point, you must be as curious as I am about how to manufacture 500 million MOSFETs, each only 40nm in size, on a 15mm x 15mm square silicon wafer. If we were to use mechanical methods to achieve this, it would be incredibly difficult to find precision instruments capable of sculpting nm-level MOSFETs. Even if such instruments existed, the costs and time required to engrave 500 million would be unimaginable.

By using light, we can etch marks on the silicon wafer, and masks can control which parts of the wafer will be etched. The areas covered by the mask will not be exposed to light and thus will not be etched. After the silicon wafer is etched, an oxide layer and a metal layer are applied, followed by additional etching. This process is repeated multiple times until the silicon wafer is completed. Generally, making a silicon wafer requires a dozen etching processes, each with different techniques and masks. Between several etching processes, there may be deviations in the etched positions, and if the deviation is too large, the resulting chip will be unusable. Deviations must be controlled within a few nm to ensure yield rates, which is why the technology used to produce silicon wafers is one of the most precise technologies invented by humankind.

Chips can be mass-produced through mask etching, but the masks must be processed and made with even higher precision machines, which are extremely costly. A single mask can cost around $100,000. Manufacturing a chip requires dozens of different masks, so the initial investment for chip manufacturing is very high, often running into millions of dollars. The trial production process of chips is called tape-out, which also requires masks and involves significant investment. Before tape-out, no one knows if the chip design will be successful, and it may take multiple attempts before achieving success. Thus, there are very few companies in China capable of producing high-end chips; the mask costs alone are prohibitive for most companies.

Once a chip is in mass production, the costs become relatively low. A good mask can be very large, with a diameter of 30 cm, allowing the simultaneous production of hundreds of chips. If the shipment volume is high, profits can still be very high. For instance, Intel’s chips sell for over $1,000 each, but the average manufacturing cost may be less than $100. However, if the shipment volume is low, the average manufacturing cost of the chip can be shockingly high, and it is not uncommon for several million dollars to go down the drain.

Whether HiSilicon’s chip prices are competitive depends on the shipment volume of Huawei’s phones. I have seen some people asking whether 20nm is better than 40nm. From the size perspective, it is clear that 20nm is better. 20nm means that the size of the MOSFET is only 1/4 that of the 40nm. The operation of the MOSFET is a process of charging and discharging, and the smaller the MOSFET, the less charge it requires to operate, resulting in lower power consumption. Moreover, as the MOSFET size decreases, the density of the gate circuits increases, allowing more MOSFETs to fit into the same-sized chip, thus increasing performance potential. The gate density at 40nm is 2.35 times that of 65nm. However, all of this is theoretical data that does not consider leakage and secondary effects.

Of course, reducing IC size has its physical limitations. When we shrink transistors to around 20nm, we encounter issues from quantum physics, resulting in leakage currents that negate the benefits gained from reducing L. One way to mitigate this is to introduce the FinFET (Tri-Gate) concept, as illustrated in the following image. From Intel’s previous explanations, we can see that by adopting this technology, we can reduce leakage caused by physical phenomena.

Understanding the Design and Manufacturing Process of HiSilicon CPUs: A Look at Huawei's Challenges Figure 1

Why do some say that major manufacturers face significant challenges when entering the 10nm process? The main reason is that the size of one atom is about 0.1nm, and at 10nm, a line would contain less than 100 atoms, making production very difficult. Moreover, any defect at the atomic level, such as an atom falling out or impurities during production, can lead to unknown phenomena that affect product yield.

If you cannot imagine this difficulty, you can conduct a small experiment. Arrange 100 small beads into a 10×10 square on the table, then cut a piece of paper to cover the beads, and finally use a small brush to remove the beads on the side, leaving a 10×5 rectangle. This will help you understand the difficulties faced by major manufacturers and how challenging it is to achieve this goal.

Next, let’s talk about secondary effects. Anyone who has studied basic physics knows that the simplest circuit consists of a power source, wires, and a resistor. When the power is connected, current flows instantly through the resistor. If we replace the resistor with an inductor, the inductor will have a gradual charging process, and in this case, the current does not flow through the inductor instantaneously.

In fact, resistors also have inductive reactance, but it is very small and can be ignored. However, if the voltage across the resistor is very small and the current is also very small, then the inductive reactance cannot be ignored. Secondary effects are very pronounced in chip processes that are very small (below 28nm). With low voltage and small current, the charging of MOSFETs is more affected by inductive reactance than at 40nm, leading to slower charging speeds. To achieve high frequencies, MOSFETs need to operate at higher voltages, which increases power consumption. Leakage is also a side effect of low process sizes and must be accounted for in chip power consumption. Therefore, the power consumption advantage brought by smaller processes is significantly offset by leakage and secondary effects.

Of course, new and good processes can partially address the above two issues, as different processes use different physical and chemical materials, and the process flows are different. Qualcomm’s quad-core uses the older 28nm process, which currently shows little advantage over the 40nm process.

As for process technology, the most advanced process I have heard of is 7nm, but this process is only available in laboratories and has not yet reached the scale needed for mass production. The difficulties associated with low process sizes are challenging to overcome. Anyone who has studied physics knows about diffraction of light; low process sizes mean that the mask apertures will be very small, leading to severe diffraction, making it impossible to etch silicon wafers. Perhaps this issue can be solved by using electron beams or other particle beams to etch silicon wafers, but that is a problem for those experts to solve.

2. Chip design tests a company’s technical level.

Now let’s talk about design. Chip design is divided into front-end design and back-end design. Front-end design is similar to creating architectural blueprints, where the logic, modules, and gate relationships of the chip are completed. Back-end design involves layout and routing; once the chip is made, it becomes a physical object, and where each MOSFET is placed and how each line is connected is determined by back-end design. There is not much to say about front-end design, although it is highly technical.

I will focus on back-end design, which is more interesting. The layout and routing of 500 million MOSFETs, although many use IP hard cores that other manufacturers have already completed, is definitely not an easy task. For instance, two wires on a silicon plane cannot cross each other; they are not like the wires in our homes that are insulated with plastic. If we were to place the wires of 500 million MOSFETs on a single plane, ensuring some connections while preventing others and avoiding crossings would be absolutely impossible.

In fact, a chip’s wiring may have a dozen layers from top to bottom. Each layer is like a spider web of wiring, and if we were to shrink ourselves to 1nm and walk around the chip world, we would find it to be a magnificent and incredible place. Back-end design must ensure correct connections while minimizing module area and power consumption, avoiding secondary effects, which sets a very high standard. Graduates from prestigious universities who engage in back-end design may only just be getting started after two years of work.

Next, let’s talk about simulation. Before tape-out, no one knows what the chip will look like, let alone speculate whether its design is successful or reasonable. The cost of tape-out is high, so it is not feasible to tape out just to verify design success. This is where simulation comes into play, using computers to simulate the circuit’s operation. Simulation runs through the entire chip design process, including front-end simulation, back-end simulation, analog simulation, and digital simulation… Simulation relies heavily on computer simulation software, such as Synopsys and Cadence, which are giants in the chip design and verification software field. I do not know the exact fees HiSilicon pays them each year, but it is certainly in the tens of millions.

Simulation is a task that requires ultra-high-performance computers. HiSilicon has a large number of high-performance computers in its IT center as part of its cloud computing resources, but when faced with large simulations, it still struggles. Running for several hours can only simulate a few seconds of chip operation. These computers are running simulations 24 hours a day. Just to give you an idea, our department has a Linux server with an Intel 4-core 4G CPU and 16G memory.

This is just a server for miscellaneous tasks, hosting a database and compiling some software. HiSilicon’s small network Solaris access server has over a hundred people working on it. This shows that the investment in chip design is indeed very significant; just in software and hardware costs, each person can cost the company hundreds of thousands a year.

Regarding HiSilicon’s current level, I do not want to boast, but there is indeed a significant gap compared to American companies. After all, in the 1980s, when chip design and manufacturing were already mature in the US, we had just gotten our first computer. For example, the K3V2 has many modules that are sourced from others, and the company has spent a lot of money to purchase the rights, known as IP cores.

IP cores can be soft cores or hard cores; there are also soft-hard hybrid cores… What are they? For example, ARM instruction licensing is a soft core; it only specifies the CPU instruction set. It’s like building a bridge; it only tells you how long and wide the bridge should be and what it should roughly look like, but it does not provide details on how the circuits should be arranged on the chip or how they should be connected. The advantage of soft cores is that they provide a lot of room for creativity, making imitation and copying easier, and they can serve as references for future similar products. Hard cores, on the other hand, provide a specific layout of the circuit on the chip; you just need to place it and use it. The advantage of hard cores is that they are generally validated by other chips, making it easy to understand their specific performance. However, you can hardly modify them, and it is difficult to understand their implementation details, as there are millions of MOSFETs involved, making analysis challenging.

HiSilicon has few proprietary IP cores, mainly in the baseband and digital TV set-top box areas, which are relatively strong. HiSilicon’s set-top box chips account for over 90% of the world’s market share (according to the boss). The K3V2 is largely a matter of stacking blocks, piecing together a USB core, an audio decoding core… But objectively speaking, chip design is becoming increasingly specialized, and each company only completes a small part of the work. Even Qualcomm uses many IP cores from other companies.

It is absolutely impossible for a company to do everything by itself; even if it could, its chips would not be competitive. However, playing with building blocks also requires a significant level of technical skill, and HiSilicon is definitely the best in the country at this. Currently, the company’s goal is to increasingly automate more modules, but this takes time.

Starting from the most basic chips, I mentioned MOSFETs at the beginning, and now let’s talk about NAND gates. As mentioned earlier, MOSFETs are the smallest units from the perspective of chip manufacturing. However, in chip design, the smallest unit used in digital circuits is the gate circuit, and NAND gates are one of the most commonly used types. A NAND gate typically requires four MOSFETs. We are all familiar with NAND gates. As shown in the figure:

Understanding the Design and Manufacturing Process of HiSilicon CPUs: A Look at Huawei's Challenges Figure 2

Everyone knows that there are two states for a switch at home: on and off. When only one of the two switches (Switch 1 and Switch 2) is on, the NAND gate processes this, and Switch 3 will turn on. If both Switch 1 and Switch 2 are off or both are on, Switch 3 will turn off. In fact, similar NAND gates can be found everywhere in life. For example, if someone has a lamp at home with a switch at the door for convenience when entering and exiting, and another switch by the bed for turning off the light at night, this is actually a NAND gate controlling the same lamp. If one switch is on, the light is on; if both switches are on or off, the light is off.

In this way, using a NAND gate and an AND gate can simulate the simplest adder, which can only calculate 1+1 at most. There are billions of such gates in computers, and they can work together to perform very complex calculations. Most CPUs today are 64-bit, and such CPUs certainly have 64-bit adders or even 128-bit adders. Taking a 64-bit adder as an example, it can calculate 18446744073709551616 + 18446744073709551616.

At this point, we must mention chip frequency. The K3V2 was initially touted as a 1.5G quad-core, but when it was released, it changed to 1.2G, and then to 1.4G with the D1 quad-core… this has been quite disappointing and sparked a lot of debate. However, most people, like I was before, only know to argue about how many Gs it has without understanding what chip frequency means. First, let’s clarify what 1G is: it represents 1 billion (1,000,000,000) cycles per second. Why is this important? As I mentioned earlier, the state of Switch 3 changes with the states of Switch 1 and Switch 2. For humans, the change in Switch 3 is instantaneous, but it does take some time. Switch 3 may be the input switch for another gate circuit; if it changes halfway through, the next gate circuit may receive the input from Switch 3 and cause serious issues.

Generally speaking, a layer of gate circuits must wait until the previous layer has completely changed and stabilized its output before it can accept the input from the previous layer and start changing. At this point, a conductor is needed to direct when these gate circuits should start changing; this conductor is the chip frequency. The conductor issues pulses at regular intervals, and 1G means one pulse per second. When the gates receive the pulse, they start to change.

From the above, it is clear that the faster the conductor conducts, the faster the chip can perform computations. However, it should be noted that doubling the frequency does not necessarily mean doubling the performance. This is because the CPU frequency does not synchronize with the memory and peripherals; the greater the frequency difference between them, the more idle cycles the CPU has. Additionally, it is important to note that the process of changing gate circuits is essentially the charging and discharging process of MOSFETs. The faster the MOSFETs charge and discharge, the higher the chip frequency can be achieved, while secondary effects slow down the charging and discharging of MOSFETs. To enable MOSFETs to charge and discharge more quickly, the voltage must be increased, which also raises the power consumption of the chip.

Many people may have several questions about HiSilicon:

“

1. Since HiSilicon uses ARM’s IP cores, can they produce the K3V2 (HiSilicon’s quad-core A9 architecture processor) with their eyes closed?

2. What exactly is the ARM core?

3. What is the strength of the team that developed the K3V2, and what is HiSilicon’s standing?

4. Does HiSilicon have any competitiveness? Where are its core technologies, and how does it compare to foreign companies?

Let’s first talk about ARM’s IP cores. ARM licensing includes instruction sets and CPU core architectures. From what I understand, besides Qualcomm, other chip manufacturers use ARM’s CPU core architecture, commonly referred to as A9 and A15. Qualcomm is more high-end and develops its CPU core architecture; if it produces something better than A9 or A15, it can indeed enhance CPU performance. However, since ARM charges a high fee for modifying core architecture, companies must pay more to ARM. The instruction set is the interface between the CPU and the upper-level compiler, operating system, and applications. Using ARM’s instruction set means that the CPU you design can be compatible with the Android system, install applications, and use C compilers.

If a company creates an entirely new instruction set, the CPU it produces would be useless, as there would be no operating system or applications available for it. Previously, Lenovo produced the K800, which used the Intel Atom CPU. This CPU was quite special as it utilized the X86 instruction set, resulting in a disaster; many games were incompatible. However, Intel still had to thank Google; otherwise, this CPU would not even be compatible with Android. Currently, it is quite difficult to develop a CPU without using ARM’s instruction set, and as more applications only support ARM, ARM’s position will become increasingly solidified, similar to computer CPUs that require the X86 instruction set to install Windows; it is a monopolistic empire.

Next, let’s discuss CPU core architecture. Before that, we must mention PDK. PDK stands for Process Design Kit, which is closely related to the manufacturing process of wafers. What is PDK? It describes the electrical characteristics of basic components in a specific process. For example, the electrical characteristics of MOSFETs produced using TSMC’s 28nm process will differ from those produced using the 40nm process. The rated current range and voltage range of MOSFETs produced by 28nm and 40nm processes will differ, and under the same external input, the output curves will certainly also differ. Chip companies without PDK would have no idea how their designed circuits would perform, nor could they run simulations. To put it simply, if you design a circuit using a 40nm PDK and produce it using a 28nm process, the resulting chip would be utterly useless. Therefore, chip design is quite arduous; while programmers can reuse code, chip designers must start from scratch if they change the manufacturing process.

ARM provides Huawei with only FPGA code for the CPU core architecture; it is not process-related. This reduces the work for digital front-end design significantly, but a lot of work is still required for back-end design. However, ARM only provides a computing core, with no peripherals included. What are the peripherals? They include USB IP cores; without these, phones would lack USB functionality; GPUs, which do not need further explanation; audio IP cores, which provide Dolby sound effects; video decoding IP cores, which allow video playback without software decoding; and CPU power control IP cores, which indicate that the K3V2 has low power consumption, showing that HiSilicon has done well in this area. Many of these peripheral IP cores are purchased by HiSilicon, but some have been developed independently. Therefore, when evaluating a CPU, it is essential to look beyond just the frequency; the quality of peripheral IPs varies, and some high-end IP cores have very high licensing fees. Even if a company purchases many IP cores, it does not mean that the chip can be produced easily.

By the way, many of Qualcomm’s peripheral IP cores are also purchased externally. Now, let’s discuss the development team for K3V2 at HiSilicon. This team was previously part of the digital development department of HiSilicon’s platform, whose specific name I forget. Before developing K3V2, they were not particularly well-known. The technical strength of this team is comparable to that of other development departments at HiSilicon, as they did not recruit top talents from other departments when developing K3V2. Furthermore, it cannot be said that K3V2 is HiSilicon’s most technically advanced product. HiSilicon has been established for about seven or eight years, and before K3V2, its core technologies were primarily in router chips and security chips.

People can look up Huawei’s latest high-performance routers, which have throughput several times that of Cisco’s high-performance routers, leading Cisco by at least a year. How is this achieved? Because those routers use chips specifically customized by HiSilicon, which are also ARM architecture-based, but with peripheral IP cores tailored for processing network data. These IP cores have independent intellectual property rights. Writing programs into chips is currently a trend; a typical example is that previously, playing RMVB files required software decoding, which heavily occupied the CPU, leading to stuttering during playback. Nowadays, CPUs or GPUs generally have hardware decoding capabilities for RMVB, allowing programs to run faster. Thus, Huawei’s routers can outperform Cisco in terms of performance.

Therefore, it is clear that HiSilicon is not a newcomer to ARM technology, and the ability to produce the quad-core K3V2 has its reasons. Additionally, the eight-core and sixteen-core versions are currently under development. When HiSilicon is working on mobile chips, it has almost no advantages compared to foreign manufacturers, as it has not previously engaged in mobile chip production, and the degree of autonomy in IP cores is still relatively low; advantages will gradually accumulate over time. Moreover, HiSilicon does possess its core technologies, and other manufacturers may not necessarily outperform HiSilicon in router chips.

PS: Recently, I have been working late. I return home at around 9:30 PM, take a shower, mop the floor, do the laundry, and by the time I finish, it’s almost 10:30 PM. I feel a bit tired now. Let me casually speak about work, as I believe this is also something many are curious about. The rumors about Huawei’s work environment being unbearable, with employee exploitation and crazy overtime, have long been popular online. I was somewhat apprehensive before joining. Now that I have been working at the company for almost three months, I feel that the work pressure is indeed considerable, but it is not as terrifying as described online. Generally, I clock in at 8 AM and leave around 8 PM, working about nine to ten hours a day, excluding lunch and dinner. During working hours, I feel quite tense, and it is indeed somewhat more tiring than at many other companies. But how should we view this? I believe the main difference between someone earning 20,000 yuan a month and someone earning 10,000 yuan is that the former creates at least twice the value of the latter. Some people earn more, but they must also put in more effort. The top talents in the US are the ones we fear dealing with the most. That person has a lot of demands and often overwhelms our department’s staff. His annual salary exceeds $500,000, which is enviable, right? However, I find he is often still at work in the afternoon, which translates to the early hours of the morning in US time. I’ve also heard he plans to buy a villa in Silicon Valley for $5 million.

What do you think about this question? Would you prefer a comfortable but lower salary or a challenging job with a higher income?

After following the WeChat account eetop-1

Reply with the following keywords to view recommended articles:

CPU:

From Sand to Chip: See How CPUs Are Manufactured

icsj01 : IC Design Complete Process and Tool Overview

icsj02 : A Discussion on Chip Design

icsj03 : Thoughts on IC Design

icsj04 : A Complete Process of Digital IC Design (Very Detailed!)

icsj05 : Global View of Digital IC Design Technology (110-page PPT!)

icsj06 : Issues to Note in Various Stages of ASIC Design

icsj07 : The Controversy of Integrated Circuit Reverse Analysis

The Greatest:

The Ten Greatest Formulas in the World
The Ten Algorithms that Rule the World
Legendary Figures in Microwave RF
The Ten Most Famous People in the History of Integrated Circuits
The King of Electricity: The True Story of Nikola Tesla
Famous Electrical Experiments: A Brief History

Oil Prices:

Latest Equivalent Circuit for Oil Price Adjustments by the National Development and Reform Commission

For business inquiries, please add my personal WeChat: jack_eetop or QQ: 228265511

Striving to build the leading brand for electronic engineers in China on WeChat!

The forum is currently running a promotional event for XILINX FPGA with prizes, including a Mate8 phone. Click the link below to read the original text.

Understanding the Design and Manufacturing Process of HiSilicon CPUs: A Look at Huawei’s Challenges

Leave a Comment Cancel reply

Related posts

Leave a Comment Cancel reply