This is the transcript of my video sharing on March 2, discussing the essence of assembly language, the significance of learning assembly language, and learning methods.
Keywords: Knowledge System, Assembly Language, Software, System Software, Chips, Middleware, Kernel, System Calls, Programming Models, Assembly, Application Layer, Disassembly, Compiler, Logical Operations, Operating System, Software-Hardware Interface, Architecture
Below is the text record
0. Author Introduction
Before diving into the main content, let me give a brief self-introduction, which is somewhat related to the following content. I majored in Microelectronics in college, which is essentially chip technology. Since starting my master’s degree in 2005, I have been involved in microcontroller development, and later I worked on Linux system porting. Since then, I have primarily focused on software, while keeping an eye on chip-related topics, although I haven’t worked on them directly. The last time I worked on a specific project related to chips was during my master’s when I did some chip-related work in the lab. Therefore, I have not undergone a particularly comprehensive training in computer science; much of my knowledge has been acquired as needed when encountering problems. I emphasize two points in my learning: one is the importance of a systematic knowledge approach, and the other is the significance of referring back to original documents.
1. The Essence of Assembly Language
Today’s sharing will also touch on the systematic knowledge and original documents. First, why should we discuss whether to learn assembly language? Because when learning any knowledge, we must have two perspectives. One is to clarify what this knowledge is about, and the other is to step outside this knowledge or skill to see what it is from an external viewpoint. So today, we will first look at assembly language from the outside in.First, what is assembly language? In my understanding, assembly language is essentially the interface between computer software and hardware. So what does this interface imply? It means that by delving deeper, we can understand computer architecture, and by looking upward, we can understand operating system software.
2. The Significance of Learning Assembly Language
I have always been involved in system software development, and much of my work experience is related to operating system software. Several projects I worked on directly or indirectly utilized assembly language skills; let me give a few examples.The first example is from about ten years ago when I worked at Zhongxing Micro. I was responsible for supporting the CPU and environmental software of the chip platform. One of the tasks was machine testing, which involved supporting the production line in screening qualified and unqualified chips. This required providing test cases, and part of the code had to be written in assembly language. This was my first experience writing assembly language by hand. Not only did I write assembly language during this experience, but I also had to understand how functions were called. I was aware of the concept of ABI, which stands for Application Binary Interface. I knew this concept but did not have a clear understanding of the specifications that function calls needed to adhere to. It was during that machine testing that I specifically looked at the ARM AAPCS manual, which is the ABI manual for the ARM Architecture Procedure Call Standard. After understanding this manual, I gained a systematic understanding of the relationships between function calls. This corresponds to the earlier points about emphasizing systematic knowledge and original documents.The second example is from my time at SUSE, where I worked on virtualization, primarily responsible for middleware, specifically libvirt and QEMU, with a bit of work on Xen. This work was relatively high-level, as libvirt and QEMU are primarily written in C, which can be seen as C application development, and it hardly involved assembly language. However, an issue arose where we discovered a problem with system startup in a certain scenario. Those familiar with virtualization know that the system startup process in QEMU can be similar to or different from that of a physical machine. I found this issue strange because I had previously worked on ARM but was unfamiliar with the x86 platform. At that point, I had two choices: one was to look at x86-related materials to understand the BIOS process and startup flow. However, I found that the information available on Google was generally not very specific; it was quite abstract and only described the general process without concrete details. This did not help me solve the problem, so I attempted to look at the BIOS code through assembly language. I discovered that it jumped to a certain address to complete the transition from BIOS to the next BootLoader.The third example is from my work at Huawei, where I worked on something related to ABI, specifically the AARCH64 ILP32 project, which involved using assembly in a 64-bit system while supporting 32-bit ELF. In this case, the compiler was mostly ready, with some bugs, but generally functioning well. My main task was to improve glibc and the kernel, which hardly involved assembly language, except for ld.so and system calls, where I had to consider how to pass parameters from the application to the system. However, I found that understanding assembly language was still very helpful in this project. Why? Because when debugging certain bugs, not knowing assembly language could be quite inconvenient. For instance, we were working with a big-endian system, and we had to consider two issues: the conversion between big-endian and little-endian. We needed to ensure that the code was correct for both big-endian and little-endian. The second issue was the conversion between 32-bit and 64-bit, as the user side was 32-bit ELF while the kernel was 64-bit. These two areas often encountered problems. At this point, aside from performing code-level analysis, another approach was to look at the disassembly to see how the relationship between 32-bit and 64-bit was handled.From these three examples, we can see that assembly language is not only necessary for low-level development, such as operating system development or kernel and driver debugging, but it is also a great entry point for understanding the CPU’s programming model and the system. So returning to our original question: should we learn assembly language? My experience is that if your work involves system software and middleware related to the CPU, then learning assembly language is essential. If not, such as in jobs involving JAVA or other languages, knowledge related to ByteCode and corresponding bytecode may also be necessary.
3. How to Learn Assembly Language
We have mainly discussed whether to learn assembly language; the next section will focus on how to learn assembly language. Reflecting on our previous kernel knowledge sharing, we mentioned a point about software layering. Everyone should be familiar with this concept. When learning assembly language, we can also adopt this idea: which level of assembly language should I start learning? Clearly, application-level assembly might be easier to begin with.When first looking at assembly, we can start by examining what application-level assembly is about. Application-level assembly includes understanding the basic structure of programming languages, such as sequential execution, conditionals, and loops. It encompasses arithmetic and logical operations, as well as basic instructions for branching and jumping. With these instructions, we can explore the interactions between assembly language and C/C++ calls, such as ABI-related matters. If a crash occurs, whether it’s a core dump or a kernel crash, we can use assembly language to pinpoint the exact location of the error, as the line numbers obtained from disassembly may not be very accurate.In addition to understanding basic arithmetic and logical operations, we can delve deeper into learning. A direct approach is to find some materials to study, which is certainly fine. What other methods can we use? For instance, we can look at architecture manuals, such as those for x86, ARM, and RISC-V, to see how these manuals are written. I believe this is a final goal; if we can solve assembly language-related problems based on these manuals, then our learning of assembly language is quite thorough, and we can say we have reached a level where we can self-iterate. The challenge lies in how to reach this state; directly looking at these architecture manuals can be quite daunting.So what should our approach to learning assembly language be? One method is to write some code without enabling compiler optimization options like -O3, and then observe the disassembly. This way, we can gain a basic understanding of assembly language. Of course, in actual work, we might enable -O3 or -Os optimization options, which is fine. We can then compare the differences between optimized and non-optimized code, noting the variations in assembly instructions. This will provide us with firsthand material to understand assembly language. At this level, we might also encounter inline assembly, such as GCC’s inline assembly. According to our previous learning method, we should refer back to the original documents.For GCC assembly language, we should consult the GCC manual, which provides detailed explanations about inline assembly. We need to understand how to write GCC inline assembly based on these detailed explanations. This also involves some ABI knowledge, such as what extra flags are needed in certain situations (see reference material 1). Here, we can see that it is indeed a software-hardware interface; while learning assembly language, we are also enhancing our understanding of system software and CPU architecture.After gaining an understanding of basic assembly concepts like arithmetic, logical operations, conditionals, and jumps, what comes next? We will see that the assembly used in operating systems differs from that used in applications. The distinction is that the operating system kernel operates at a privilege level, which corresponds to certain privileged instructions. For example, how to enter and exit a privilege level, and how to manipulate CPU control registers. These control registers may involve capabilities of the operating system, such as interrupt enabling, system call routing, and interrupt routing. Here, we can observe that assembly language is highly related to CPU architecture. This means that when learning a specific CPU architecture, we can refer to how operating systems like Linux and RTT write assembly language. Additionally, based on debugging experiences with operating systems like Linux, we can revisit assembly language we have not encountered before and examine its relationship with CPU architecture, identifying which aspects are related to CPU architecture.Besides the system layering we just discussed, what else is included in system layering? For instance, the memory model. ARM and RISC-V have a unified addressing scheme, while x86 has traditional methods and also includes I/O addressing methods. These aspects also differ in assembly language, and within the memory model, there is page table handling. This involves where the page table base is located and how many page table bases there are, all of which require assembly language to access special function registers.Through this learning process, we can gain a comprehensive understanding of assembly language, enabling us to debug both application crashes and operating system kernel crashes. But does this mean we are done learning? Not yet, because there is another important concept: pseudo assembly. Pseudo assembly is something we use very frequently. Have you ever encountered a situation where you see a piece of code, but when you search the manual, you cannot find the corresponding assembly instruction, even though it is clearly something that the assembler can compile and is usable in commercial/open-source software? This is what pseudo assembly is. It does not correspond to a single machine instruction; rather, it is a variation of a certain assembly instruction or a collection of multiple instructions.Pseudo assembly is architecture-dependent, meaning it varies across different CPU architectures. For example, RISC-V has specific pseudo assembly instructions documented in the RISC-V specification. Understanding pseudo assembly instructions is crucial; it helps us determine whether our learning of assembly language is practical or merely theoretical. If I am unaware of pseudo assembly, I may struggle to read code written by others in actual engineering scenarios.OK, we have discussed basic arithmetic, logical operations, conditionals, privileged instructions, memory models, and pseudo assembly. What else is there in assembly language? There are also multimedia instructions like x86’s 3DNow and MMX, as well as ARM’s NEON and VFP floating-point instructions. These instructions correspond to assembly language as well. The approach is similar to what we discussed earlier; we need to look at how basic applications utilize these instructions and also consider how integers and floating-point numbers interact with each other. I won’t elaborate further on this.
4. Recommended Books and Documents
Throughout the learning process, I believe it is crucial to emphasize returning to original documents. What are original documents? They include CPU architecture manuals, operating system code, and important books on operating systems. I recommend that everyone should have at least two books on hand: one that discusses CPU and computer organization principles, such as “Computer Organization and Design: The Hardware/Software Interface,” and another on operating systems, such as the recent book by Professor Chen Haibo titled “Modern Operating Systems.” Classic books are also fine; there is no need to get too caught up in which one to read. My suggestion is to compare the two books on CPU and operating systems, as we will find that some content is covered more extensively in the CPU book, while other content is more prevalent in the operating system book.We must continuously emphasize returning to original documents. In addition to books on computer architecture and operating systems, we may also need to refer to ABI-related manuals. Everyone can take this opportunity to look at the common CPUs they use, such as x86, ARM, and RISC-V, and see what materials they can review. Start by filling in the gaps in your existing materials, and then based on the problems you encounter, refer to the corresponding materials. The ultimate goal should be to consult original documents to solve problems.
5. Conclusion
This concludes my sharing today. To summarize, I started with a self-introduction. Since I transitioned from another field, I place great importance on systematic knowledge and original documents. Today, we began by discussing assembly language as a software-hardware interface. I shared three of my project experiences, some of which clearly required writing assembly, while others seemed unrelated but led me to explore assembly language to solve problems. Additionally, I highlighted how understanding assembly language can be beneficial when debugging system software, including middleware. In the second part, I shared how to learn assembly language, starting from the application layer and then delving deeper into the differences in privileged assembly language, which involves privileged instructions and memory models. Finally, we discussed pseudo assembly and floating-point instructions.Thanks to Cheng Lei for helping organize the transcript.I am Zhang Jian, a former architect at Huawei, a former technical partner at a startup, and currently a freelancer. I am also a father of two children. I have a lot of firsthand experience in balancing career development, work, and life. I welcome everyone to discuss this with me.Reference Materials
- https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
Teacher Zhang Jian’s technical class: ARM architecture and debugging, connecting ARM architecture with operating systems. If you want to deepen your understanding of assembly language, exception handling, and memory management, you can pay attention to this. Click to read the original text for course introduction.