This is a question from my planet: “What language is C written in?”
To ask it from another angle, it is actually: C needs to be compiled before it can run, so where does the C compiler come from? What language is it written in? If it is written in C itself, then which came first, the chicken or the egg?
1
Let’s assume there are no compilers in the world, and start with machine language to see what can be done.
Machine language can be executed directly by the CPU without a compiler.
Then there is assembly language. Although assembly language is just a mnemonic for machine language, it still needs to be compiled into machine language to be executed, so the first compiler must be written in machine language (which will not be needed later).
With the assembly language problem solved, we can take a big step forward, as we can now write a C language compiler in assembly language; we can say this is the ancestor of C compilers.
With this ancestor, we can compile any C language program, so can we write a compiler in C itself? We just need to compile it with the ancestor.
Okay, after all these layers, we finally get a compiler written in C, which is quite troublesome.
At this point, we can discard the previous C language compiler written in assembly.
Of course, if other high-level languages, such as Pascal, existed before C, then we could use Pascal to write a C language compiler.
The first Pascal compiler is said to have been written in Fortran. As the first high-level language, Fortran’s compiler should have been written in assembly language.
2
There’s an interesting legend about compilers:
It is said that Ken Thompson, one of the inventors of Unix, could swagger up to any Unix machine at Bell Labs, enter his username and password, and log in as root!
With many talented people at Bell Labs, some vowed to find this vulnerability. They read through the C source code of Unix and finally found the backdoor for logging in. After cleaning the backdoor, they compiled Unix again, but Thompson could still log in.
Some thought there might be an issue with the compiler, so they rewrote a compiler in C and recompiled Unix with the new compiler, thinking everything was finally safe.
But it still didn’t work; Thompson could still log in as root, which was incredibly frustrating!
Later, Thompson himself revealed the secret: the first C language compiler had a problem, and when it compiled the Unix source code, it naturally inserted a backdoor. What’s more, if you wrote a new compiler in C, it would also need to be compiled into binary code, and the only way to do that was with Thompson’s first compiler. Thus, your newly written compiler would be tainted, and it would also insert a backdoor when compiling Unix 🙂
Speaking of this, I recall the XcodeGhost incident a few years ago, which simply put, involved a trojan being implanted in Xcode (downloaded from unofficial channels), causing all iOS apps compiled with Xcode to be tainted, which hackers could exploit for illegal activities.
Although this XcodeGhost incident pales in comparison to Thompson’s backdoor, it reminds us to download software from legitimate channels, from official websites, ensuring the website’s HTTPS standard, and even verifying the checksum.
3
Some may ask: It’s quite troublesome for me to write a Hello World in assembly, how could anyone write a complex compiler with it? Is that possible?
Of course, it is possible. When developing the first generation of Unix, there was no C language, and Ken Thompson and Dennis Ritchie wrote Unix line by line in assembly. The first version of WPS was written in assembly by Qiu Bojun, and the Turbo Pascal compiler was also written in assembly by Anders. The abilities of these great minds are beyond ordinary imagination.
For compilers, a “snowball” method can also be used for development:
Taking C language as an example, the first version can first select a subset of C language, such as only supporting basic data types, flow control statements, function calls… We refer to this subset as C0.
Then, we write a compiler in assembly language, only dealing with this subset C0, making it much easier to write.
C0 can work, and then we expand this subset by adding structs, pointers… and call the new language C1.
So who writes the compiler for C1? Naturally, it’s C0.
Once C1 works, we further expand the language features and write a compiler in C1 to get C2.
Then C3, C4… and finally we get the complete C language.
This process is called bootstrapping.
END
I am Liu Xin, the author of the bestselling book “The Programmer’s Turnaround,” with over 15 years of development experience, a former IBM architect, having led multiple enterprise application architecture design and development projects; I have insights into the essence of technology and excel at explaining complex technologies through stories.
For just 40 cents a day, join the knowledge community of “The Programmer’s Turnaround,” and engage in deep discussions with me and other experts from Tencent, Alibaba, JD, Didi, IBM, SAP, and more on technical learning, project development, programming skills, career development…