How Programming Languages Are Implemented

Do you know how the programming languages you often use are implemented?
Today, let’s talk about this question.
Smart humans discovered that by combining simple switches, they could express complex boolean logic. Based on this, they built the CPU, which can only understand switches in a simple way, represented numerically as 0 and 1.
How Programming Languages Are Implemented

The Genesis: Smart Idiots

The CPU is quite primitive, like a single-celled organism; it can only move data from one place to another and perform simple additions without any complex actions. Although these operations seem simple and clumsy, the CPU has an unparalleled advantage: speed. This is something humans cannot compete with. After the CPU appeared, humanity began to have a second brain.
This primitive species began to dominate another species called programmers.
Starting to Speak the CPU’s Language
Generally, for two different species to communicate, such as humans and birds, there are two methods: either the bird speaks human language for the human to understand, or the human speaks bird language for the bird to understand; it depends on who is more capable.
Initially, the CPU won, and programmers began to speak bird language to acknowledge the CPU’s dominance, allowing the CPU to work. Let’s take a look at how the earliest programmers spoke bird language:
How Programming Languages Are Implemented
Programmers wrote instructions directly in 0s and 1s according to the CPU’s will. You read it correctly; this is code, this is how primitive it is, and then input it onto punched tape for the CPU to begin working. At this time, the program was truly tangible, albeit a bit wasteful of paper.
At this time, programmers had to write code from the CPU’s perspective, and it looked like this:
1101101010011010100100110010100111001000110111101011101101010010
At first glance, do you know what this means? You don’t, and you think, “What is this junk?” But the CPU knows and thinks, “This is simply the most beautiful language in the world.”
A Great Responsibility Bestowed
Eventually, one day, programmers got tired of speaking bird language; after all, they are primates, and chattering in bird language is quite embarrassing. You were given a great responsibility: to make programmers speak human language.
You didn’t suffer hardship but carefully studied the CPU and found that the instruction set executed by the CPU is just a few instructions, such as addition and jump instructions. Thus, you made a simple mapping of machine instructions to human-readable words, mapping machine instructions to words that humans can understand. This way, the string of 01 became:
sub $8, %rsp
mov $.LC0, %edi
call puts
mov $0, %eax
In this way, programmers no longer had to rigidly remember 1011…, but could remember words like ADD, SUB, MUL, DIV that humans can recognize.
How Programming Languages Are Implemented
Thus, assembly language was born, and for the first time, something recognizable by humans appeared in programming languages.
At this point, programmers no longer needed to “chirp” but upgraded to “aba aba.” Although humans recognize these few words, there is still a significant difference in form from human language.
Details vs. Abstraction
Although assembly language has recognizable words, both assembly language and machine language are considered low-level languages.
Low-level languages require you to care about all the details.
What details do you care about? As mentioned, the CPU is a very primitive thing that only knows how to move data from one place to another and perform simple operations before moving data from one place to another again.
Therefore, if you want to program using low-level languages, you need to use multiple simple instructions like “move data from one place to another and perform simple operations” to solve complex problems like sorting.
Some students may not feel this deeply. It’s like wanting to express “go get me a cup of water”:
How Programming Languages Are Implemented
If you use assembly language to implement this, it would look like this:
How Programming Languages Are Implemented
I think you already get it.
Bridging the Gap
The CPU is just too simple, so simple that it cannot understand any slightly abstract concepts like “bring me a cup of water.” But humans are naturally accustomed to abstract expressions. Is there a way to bridge the gap between humans and machines?
In other words, is there a way to automatically convert human abstract expressions into concrete implementations that the CPU can understand? This could greatly enhance programmer productivity, and now this problem needs you to solve.
How Programming Languages Are Implemented
Patterns, All Patterns
After thinking for a long time, you still don’t know how to automatically convert human abstractions into concrete implementations that the CPU can understand. Just as you are about to give up, you glance at a pile of details that the CPU can understand:
How Programming Languages Are Implemented
In a flash of inspiration, you discover a lot of patterns.
In most cases, the instructions executed by the CPU are straightforward, like this:
How Programming Languages Are Implemented
These tell the CPU to perform a specific action. You named these straightforward instructions as statements.
Besides, you also discovered a pattern that requires deciding which instruction to follow based on a certain condition, which appears to humans as “if… then… else…”:
if ***  blablabla
else ***  blablabla
In some cases, you also need to repeatedly execute certain instructions, which looks like spinning in place:
while ***  blablabla
Finally, there are many similar instructions, like here:
How Programming Languages Are Implemented
These instructions are repetitive, with only minor differences. By extracting these differences, you can package the remaining instructions together and specify them with a code name; let’s call it a function:
func abc:  blablabla
Now you have discovered all the patterns:
// Conditional transfer
if ***  blablabla
else ***  blablabla
// Loop
while ***  blablabla
// Function
func abc:  blablabla
Compared to assembly language, this represents a qualitative leap, as it is now very close to human language.
Next, you find yourself facing two problems:
  1. What should blablabla be?
  2. How to convert the above human-readable strings into machine instructions that the CPU can recognize?
Inception
You remember that most code is straightforward statements. Is blablabla just a bunch of statements?
Obviously not. Blablabla can be statements, but it can also be conditional transfers (if-else), loops (while), or function calls. This is reasonable.
Although this is reasonable, you quickly discover another serious problem:
Blablabla can contain if-else statements, and if-else statements can contain blablabla. Blablabla can also contain if-else statements, and so on…
How Programming Languages Are Implemented
Just like Inception, there is a dream within a dream, a dream within a dream within a dream… layers of nesting, infinitely recursive…
How Programming Languages Are Implemented
At this point, you can clearly feel that your brain cells are insufficient. This is too complicated. Despair begins to consume you. Oh God, who will save me!
At this moment, your high school teacher comes over, pats your shoulder, and hands you a high school math textbook. You feel embarrassed and angry, thinking that my current problem is so profound that a mere high school math book cannot solve it. You grab it and throw it to the ground.
At this time, a gust of wind blows, and the textbook opens to a page with a number expression:
f(x) = f(x-1) + f(x-2)
What does this recursive formula express? The value of f(x) depends on f(x-1), the value of f(x-1) depends on f(x-2), and so on…
How Programming Languages Are Implemented
One layer nests another, dreams within dreams, if nested within statements, statements can also nest if…
Wait a minute, isn’t this recursion? The seemingly endless nesting above can also be expressed with recursion!
Your math teacher laughs loudly, too young, too simple, leaving you embarrassed and shocked, realizing that such seemingly high-tech things can be solved with high school math, leaving you stunned and ashamed.
With the concept of recursion, your intelligence begins to regain its ground.
Recursion: The Essence of Code
Isn’t it just nesting? One layer wraps around another, and recursion is born to express this (Note: The expression here is not complete; real programming languages are not this simple):
if : if bool statement else statement
for: while bool statement
statement: if | for | statement
The layers of nesting in the inception can be expressed so simply in a few lines. You named this high-end expression syntax.
Mathematics makes everything elegant.
All code in the world, no matter how complex, can ultimately be reduced to syntax, and the reason is simple: all code is written in the form of syntax.
At this point, you have invented a truly human-readable programming language.
However, just having a language is not enough.
Making Computers Understand Recursion
Now, there’s still one problem left: how to convert this language into machine instructions that the CPU can recognize?
Humans can write code according to syntax, and this code is actually a string of characters. How can we make computers recognize a string expressed in recursive syntax?
This is a matter of great importance to humanity, and you can’t help but feel the weight of responsibility. But this last step seems full of difficulties, and you can’t help but sigh, computers are so hard.
At this moment, your middle school teacher comes over, pats your shoulder, and hands you a middle school botany textbook. You feel embarrassed and angry, thinking my current problem is so profound that a mere middle school textbook cannot solve it. You grab it and throw it to the ground.
At this time, another gust of wind blows, and the book opens to a chapter about trees. You stare blankly at this page:
How Programming Languages Are Implemented
The trunk has branches, and branches can also have branches. This is recursion! We can represent the code written according to recursive syntax with trees!
How Programming Languages Are Implemented
Your middle school teacher laughs loudly, saying that the seemingly high-tech thing can actually be solved with middle school knowledge.
Excellent Translators
When computers process programming languages, they can organize code according to the recursive definition in the form of trees. Since this tree is generated according to syntax, let’s call it a syntax tree.
Now that the code is represented in tree form, you observe closely and find that the expression of the leaf nodes is very simple and can easily be translated into corresponding machine instructions. As long as the leaf nodes are translated into machine instructions, you can apply this result to the parent nodes of the leaf nodes, and the parent nodes can reference the translation results to their parent nodes, passing it up layer by layer until the entire tree can be translated into specific machine instructions.
How Programming Languages Are Implemented
The program that completes this task also needs a name. Based on the “principle of not understanding,” you give this translation-like program a rather unremarkable name: compiler.
Now do you still think that data structures like binary trees are useless?
At this point, you have accomplished a remarkable invention. Programmers can write code using human-readable elements, and your program, called a compiler, is responsible for translating it into machine instructions that the CPU can recognize.
Later generations built C/C++ and subsequent languages like Java and Python based on your ideas, and these languages are still in use today.

Source: The Survival of a Programmer on a Deserted Island

Editor: virens

The reproduced content only represents the author’s views

It does not represent the position of the Institute of Physics, Chinese Academy of Sciences

If you need to reprint, please contact the original public account

Recent Popular Articles Top 10

↓ Click the title to view ↓

1.What is the principle of the machine used to detect dangerous liquids during subway security checks?| No.412

2.Why do we forget everything learned in three years immediately after the college entrance examination?

3.He threw raisins into soda, and this video quickly went viral, along with a paper.

4.Do you have this muscle on your wrist? If not, have I evolved?

5.A breakthrough in curvature engines brings us one step closer to interstellar travel.
6.An unidentified creature appears on the beach, can humans relieve pain by resting on it?!
7.Why haven’t humans evolved the ability to regenerate lost limbs?
8.Case solved! Scientists have cracked the mystery of the construction of the pyramids, and the huge stones were transported this way.
9.What is the probability of being hit by

Leave a Comment