C Programming on Linux

Introduction

First of all, I believe that foundational knowledge in computer science is timeless, while trendy “practical skills” may quickly become obsolete. This is why, during the major selection process in my sophomore year, I chose the Computer Science and Technology major instead of other flashy fields.

In my junior and senior years, when studying specialized courses, I realized how useful the mathematics I had learned before was, even though I had completely forgotten it. After the exams, I returned it to the teachers, and when I revisited it, I found that I had not truly understood many concepts before; only now do I genuinely grasp them. So, weren’t the first two years wasted? Another drawback of the university curriculum is its inflexibility; each course must occupy a semester and be taught by a single instructor, with no communication or connection between different course instructors. In reality, these courses are interdependent, and forcibly separating them does not align with human cognitive patterns.

Looking back now, C language is actually a very difficult programming language. Without understanding compiler principles, operating systems, and computer architecture, it is impossible to learn it well, and thus half a semester is wasted. This leads most students to believe they know C language, but in reality, they are only at a superficial level. When it comes to writing actual code, they often struggle with bugs without the opportunity to systematically learn C language again, because, from the school’s perspective, the C language course was already “completed” in the freshman year, just like a meal that has already been eaten; regardless of whether you are full or not, you won’t be allowed to eat it again.

I am not discussing C language in isolation but in conjunction with compiler principles, operating systems, and computer architecture. In other words, the content of this book uses C language as a medium to discuss the principles of computers and programs.

Why Learn C Language

Some people often say that Python is already very user-friendly, so why bother learning the tedious C language? The image below illustrates that the significance of C language is akin to the feeling of humans being able to walk upright. The emergence of Java and Python is comparable to the advent of cars and mobile phones; although they have made life more convenient, I feel that my own abilities have diminished. A direct example is that before university, teachers did not allow the use of mobile phones or the internet, so I memorized everything using my brain. However, since starting university and using my phone, I began to rely on my phone’s memo app, and as a result, I found my memory declining.

C Programming on Linux

Linux_C? Not Windows_C

Why learn C language on the Linux platform? Isn’t it fine to learn C language on Windows?

It is indeed difficult to learn C language on Windows. Therefore, before reading other articles, you must have a basic understanding of Linux (at least at the RHCSA level), otherwise, you may not understand what I am writing at all.

C language is a low-level programming language. To write good C programs, one must have a clear understanding of how operating systems work, as operating systems are also written in C. We write application programs in C that directly use the interfaces provided by the operating system. Linux is an open-source operating system, and you can find answers to any questions from the source code and documentation. Even if you cannot understand the source code or find the documentation, it is easy to find an expert to teach you; there are always helpful experts in various mailing lists, newsgroups, and forums. In contrast, Windows is a closed operating system, and except for Microsoft employees, no one can see its source code; one can only guess its working principles through documentation. Worse still, Microsoft has always been secretive, keeping useful features for themselves and not documenting them publicly. Development tools on the Windows platform are often tied to various Integrated Development Environments (IDEs) like Visual Studio and Eclipse. While using an IDE is indeed convenient, it is not beneficial for beginners. Microsoft promotes the idea of foolproof programming, telling you that you can compile a program by dragging a few controls with the mouse and clicking a button. However, which truly useful programs are created this way? Many people who started programming on the Windows platform have been coding for years but still only know how to click a button to run a program or drag a few source files into a project to compile them. When faced with more complex requirements, they are at a loss because their understanding is limited to buttons and menus, with no concept of compilers, linkers, or Makefiles, and they may have never even used the command line. Yet, these are fundamental concepts that should be established when starting to learn programming. On the other hand, the syntax of C language is closely related to the workings of compilers and linkers. Without understanding how compilers and linkers work, one cannot truly master C syntax. Therefore, IDEs do not help you learn; instead, they hinder your learning. Originally, to learn C programming well, one only needed to master the syntax and compilation commands. Now, with IDEs, in addition to learning the syntax and compilation commands, you also need to understand how the compilation commands integrate with the IDE to consider it learned. This makes the already complex learning task even more complicated. Linux users have always been accustomed to using commands primarily, with mouse operations as a supplement. From the first day of learning programming, you should compile programs by typing commands. Once you have clarified these basic concepts, you can decide which IDE you find useful, but by then, you may prefer vi or emacs over an IDE.

Overview

1. Writing a program can be described as a process of: breaking down complex tasks into sub-tasks, further breaking down sub-tasks into simpler tasks, layer by layer, until they can be completed using the above instructions.

2. Assembly language represents a group of numbers in machine language using mnemonics, directly writing assembly programs with these mnemonics, and then letting the assembler replace the mnemonics with numbers by looking up a table, thus translating assembly language into machine language. From the above example, it can be seen that assembly language and machine language instructions correspond one-to-one; if there are three assembly instructions, there are also three machine instructions, and the assembler simply performs a straightforward replacement.

3. The statements in C language do not have a simple one-to-one correspondence with the instructions of low-level languages. A statement like a=b+1; needs to be translated into three assembly or machine instructions. This process is called compilation (Compile) and is completed by the compiler (Compiler). Clearly, the function of a compiler is much more complex than that of an assembler. Programs written in C language must be compiled into machine instructions before they can be executed by the computer, and compilation takes some time, which is a drawback of programming in high-level languages.

4. The C language compilation process

1) Preprocessing: macro expansion, header file expansion, conditional compilation, comment removal (syntax errors are not checked)

$> gcc -E hello.c -o hello.i

2) Compilation: checking for syntax errors and generating an assembly file from the preprocessed file.

$> gcc -S hello.i -o hello.s

3) Assembly: compiling the assembly file into an object file (binary)

$> gcc -c hello.s -o hello.o

4) Linking: Programs written in C language depend on various libraries, and after compilation, the libraries need to be linked to the executable program (the libraries differ between operating systems).

$> gcc hello.o -o hello_elf

View the dynamic libraries that the program depends on

$> ldd hello_elf

C Programming on Linux

5. The execution process of the program

1) The program is in external storage, executed, loaded into memory, and the program is divided into data blocks, code blocks, heap blocks, and stack blocks. In memory, these are not continuous but scattered (think of AT&T assembly code).

2) Memory does not have processing capabilities, so it interacts with the CPU (which is used for processing). The executable file is essentially passed between memory and the CPU.

Environment

I have tried to avoid using IDEs (editors, compilers, debuggers, graphical tools) as much as possible. On one hand, IDEs are relatively large, and I prefer lightweight tools. On the other hand, IDEs save us from many details (like makefiles), but these are very important.

This article is a summary of my experiences during self-study. If you have any suggestions or opinions, please leave a comment at the end. Thank you.

Leave a Comment