Understanding Android Lag: Causes and Solutions

Hot Article Guide| Click the title to read

How can programmers break through and improve themselves in the winter of the Internet?

If you suddenly get laid off, what is your Plan B?

The difference between employees who leave within 3 months and those who leave after about 2 years is beyond your imagination!

Author: Cat Eating Fish

Link: https://juejin.im/post/5d4bdb23e51d453c2577b747

Recently, Huawei’s Ark Compiler has been open-sourced, and I went to check the PPT from the launch conference. I found that as an Android developer, I actually couldn’t fully understand the knowledge points introduced in the PPT??? So I made up for the content in the PPT and organized it into this article.

This article will introduce the historical reasons for Android lagging and the process of Google’s struggle against it in simple language.

After reading this article, you will:

Understand how computers interpret the programs we write and execute corresponding functions
Learn about the evolution of the Android virtual machine
Understand the three main reasons for Android lagging from the bottom up

Basic Concepts

First, we need to review some basic concepts to understand how computers interpret the programs we write and execute corresponding functions.

1. Compilation & Interpretation

Some programming languages (like Java) can be understood by computers through a compilation-interpretation process

Let’s take a piece of Java code

public static void main(String[] args){
    print('Hello World')
}



This is the first lesson for all programmers. Just write this code and execute it, and the computer or phone will print out Hello World.
So the question arises, English is the language of the human world, how does the computer (CPU) understand English?

As we all know, 0 and 1 are the languages of the computer world, so to speak, computers only recognize 0 and 1. Therefore, we only need to express the above English code to the computer through 0 and 1, so that the computer can understand and execute it.

Understanding Android Lag: Causes and Solutions

Combining the above image, Java source code is compiled into bytecode, and then the bytecode is interpreted into machine code according to the rules in the template.

2. Machine Code & Bytecode

Machine Code

Machine code is the language that can be directly interpreted and executed by the CPU.

However, if the machine code generated from the above image runs on another computer, it is likely to fail.

This is because different computers may interpret different machine codes. In simple terms, the machine code that can run on computer A may not work on computer B.

For example, Chinese person A understands Chinese and English; Russian person B understands Russian and English. If both are taking a Chinese exam at the same time, B will probably not even find where to write his name.

So at this time, we need bytecode.

Bytecode

Chinese person A cannot understand the Russian exam paper, Russian person B cannot understand the Chinese exam paper, but everyone can understand the English exam paper.

Bytecode is an intermediate code; Java can compile to bytecode, and the same bytecode can be interpreted into the specified machine code according to the specified template rules.

Benefits of Bytecode:

1. Achieves cross-platform compatibility; one source code only needs to be compiled into one bytecode, and then the bytecode can be interpreted into the machine code recognized by the current computer according to different templates, which is what Java refers to as “compile once, run anywhere.”

2. The size of the bytecode compiled from the same source code is much smaller than the machine code.

3. Compiled Languages & Interpreted Languages

Compiled Languages

The well-known C/C++ languages are compiled languages, meaning that after compilation, programmers can directly compile them into machine code, which can be directly interpreted and executed by the CPU.

Some may ask, since bytecode has so many benefits, why not use bytecode?

This is because the original intention of each programming language is different; some are designed for cross-platform compatibility, like Java, while some are designed for a specific machine or a specific batch of machines.

For example, the OC language and Swift language developed by Apple are designed for their own products, and they do not care about other people’s products. Therefore, one of the original intentions of OC or Swift language design is speed, as they can be directly compiled into machine code for iPhone or iPad to interpret and execute. This is also one of the main reasons why applications on Apple phones are larger than those on Android phones.

This is also one of the reasons why Apple phones are smoother! (No middlemen making profits)

Compiled-Interpreted Languages

Taking the language Java used for Android development as an example, Java is a compiled-interpreted language, meaning that after compilation, programmers cannot directly compile it into machine code but will compile it into bytecode (in Java programs as .class files, in Android programs as .dex files). Then we need to interpret the bytecode into machine code to make it understandable by the CPU.

This second interpretation, that is, the process of interpreting from bytecode to machine code, is implemented in the Java virtual machine after the program is installed or run.

The Three Major Factors Causing Lag

The latest version of Android this year is already 10; in fact, the voices about Android phone lagging have gradually decreased in the past two years, replaced by voices claiming smoothness like iOS.

However, there are still relatively few claims of surpassing iOS; in fact, there are three historical reasons for Android lagging. It started lower than iOS.

1. Virtual Machine – Slow Interpretation Process

From the above description, we can know that iOS does not lag because it goes directly to the hardware layer and skips the intermediate interpretation steps. In contrast, Android, due to the absence of a direct approach, needs to interpret into machine code in real-time each time it executes, resulting in significantly lower performance compared to iOS.

We have clearly identified that bytecode (the middleman) is one of the main culprits causing lag; can we throw away bytecode like iOS and go directly to the hardware?

Clearly not, because iOS only has a few models.

In contrast, there are countless models of Android phones, countless CPU architectures/models, not to mention tablets, automotive, and other devices. Having so many types of hardware devices represents a wide variety of hardware architectures, each with its corresponding machine code interpretation rules. It is obviously unrealistic to have a direct approach like iOS.

So what should we do? Since we cannot throw away bytecode, we can only exploit it and make the entire interpretation process faster and faster. The factory where the interpretation occurs is within the virtual machine.

Next is the great evolution of the Android virtual machine!

① Android 1.0 Dalvik (DVM) + Interpreter

DVM is the Android platform virtual machine developed by Google, which can read .dex bytecode. The process of interpreting bytecode into machine code occurs in the Java virtual machine, and in the Android platform, the virtual machine refers to this DVM.

In Android 1.0, while the program runs, the interpreter in DVM interprets the bytecode. One can imagine that this efficiency is absolutely low. One word, lag.

② Android 2.2 DVM + JIT

In fact, the solution to the DVM problem is very clear; we can interpret it before a certain function of the program runs.

In Android 2.2, the clever Google introduced the JIT (Just In Time) mechanism, which is literally instant compilation.

For example, I often go to a restaurant to eat, and the owner already knows what dish I want to eat, so he prepares the dish before I arrive, saving me waiting time.

JIT is like this clever owner; it will remember the functions that users frequently use when opening an app. When users open the app, it will compile these contents immediately, so when users access these contents, JIT has already prepared the ‘dish’. This improves overall efficiency.

Although JIT is quite clever and the overall idea is clear, the reality is that it still lags significantly.

Problems:

Opening the app will slow down
Every time the app is opened, it requires repeated labor, which cannot be done once and for all.
If I suddenly order a dish that I have never ordered before, I will have to wait for it. If the user opens something that JIT has not prepared, they can only wait for the interpreter in DVM to execute and interpret.

③ Android 5.0 ART + AOT

The clever Google thought of another method; since we can compile bytecode into machine code when opening an app, why not compile bytecode into machine code when the app is installed? This way, we don’t have to repeat labor every time we open the app, achieving a one-time solution.

This is indeed a good idea, so Google launched ART to replace DVM. ART stands for Android Runtime; it optimized based on DVM and compiles applications into machine code when they are installed, a process called AOT (Ahead-Of-Time) compilation.

However, another problem arose; while opening the app does not lag anymore, installing the app is extremely slow. Some may say, an app is not installed frequently, so we can sacrifice this time. But unfortunately, every time an Android phone starts up (i.e., after a system version update or flashing), all apps will be reinstalled. Isn’t that frustrating? Remember those years when we were haunted by Android version updates!

④ Android 7.0 Hybrid Compilation

Google finally unleashed the ultimate weapon; DVM + JIT is not good, and ART + AOT is also not good. Well, I will mix them together, that should work!

So in Android 7.0, Google released hybrid compilation. This means that during installation, the app is not compiled into machine code right away; when the phone is not in use, AOT quietly compiles the parts of the code that can be compiled into machine code (as for what can be compiled, the compilation template for bytecode will be detailed below). Essentially, it does the work that was done during app installation quietly when the phone is idle.

If there isn’t enough time to compile, then JIT and the interpreter will be called upon to compile or interpret in real-time.

One cannot help but admire Google’s crude way of solving problems; as a result, Android phones have gradually emerged from the quagmire of lag.

⑤ Android 8.0 Improved Interpreter

In Android 8.0, Google focused on the interpreter; in fact, looking at the above problems, the root cause is that this interpreter interprets too slowly! (What JIT, AOT, I only have one word for interpretation: fast) So why not make this interpreter interpret faster? Thus, Google improved the interpreter, greatly enhancing the execution efficiency of the interpretation mode.

⑥ Android 9.0 Improved Compilation Template

This point will be detailed in the compilation template for bytecode below.

In simple terms, Android 9.0 provides a way to pre-place hot code during installation, allowing applications to know that commonly used code will be compiled in advance.

2. JNI – Slow Invocation Between Java and C

JNI, also known as Java Native Interface, is used for interacting with C/C++ code.

Those who do not do Android development may not know that in Android projects, in addition to Java, there is likely to be some C language code.

At this time, there is a serious problem, first of all, the image (referencing the principle PPT of Ark Compiler):

During the development phase, Java source code is packaged into .dex files, and C language is directly compiled into .so libraries because C is a compiled language.

In the user’s phone, the .dex file (bytecode) in the APK will be interpreted into an .oat file (machine code) running in the ART virtual machine, while the .so library is the binary code (machine code) that the computer can directly execute. The two machine codes must call each other, which surely incurs overhead.

Next, let’s explain why the two machine codes are different.

We need to deeply understand the compilation process from bytecode to machine code. Although both are compiled into machine code and can be directly called by hardware, the performance, efficiency, and implementation methods of the two machine codes differ significantly due to the following two points:

Different Programming Languages Lead to Different Bytecode and Machine Code

For example, for the same static language operation of int a + b, C language can load memory directly and compute in registers because C is a static language, and a and b are determined int objects.

In Java, although we also need to specify the object type when defining it, such as int a = 0, Java has dynamic properties, reflection, and proxy; no one can guarantee that a remains an int type when called, so Java’s compilation needs to consider context, i.e., specific situations must be compiled specifically.

Thus, since the bytecode is already different, the compiled machine codes must also differ.

Different Runtime Environments Lead to Different Machine Code

It is evident from the image that the machine code compiled from Java is wrapped in ART, where ART stands for Android Runtime, which is similar to a virtual machine. The runtime environment for C language is not in ART.

The runtime provides basic input/output or memory management support, and if two different runtimes need to call each other, there will inevitably be extra overhead.

For example, due to Java’s GC (garbage collection mechanism), the address of an object in Java is not fixed and may be moved by GC. In other words, the address of the object in the machine code running in ART is not fixed. But C language does not care about these details; it directly asks Java for the address of an object. However, if this address has been moved, it could lead to disaster. The solutions are two:

(Referencing the answer by Zhang Duo on Zhihu regarding the impact of Huawei’s Ark Compiler on the Android software ecosystem)

1. Copy the object in C. This obviously incurs a significant overhead.

2. Tell ART that I want to use this object; you cannot move its address! You need to stay put for now. This approach incurs relatively low overhead, but if the address cannot be reclaimed, it may cause OOM (Out Of Memory).

3. Compilation Template for Bytecode – Not Optimized for Specific Apps

Let’s use an example to understand the compilation template; “Hello world” can be translated as “你好，世界”, and it can also be translated as “世界，你好”. The difference is due to the different compilation templates.

①. Unified Compilation Template (VM Template)

Bytecode can be compiled into machine code through different compilation templates, and the performance of the compiled machine code will vary greatly depending on the compilation template.

In Android, ART has a set of prescribed, unified compilation templates, which we can temporarily call VM templates. While this template is not terrible, it is also not excellent.

Since it was created by Google, it is certainly not poor, but due to the lack of specific optimization for each app, it is also not excellent.

②. Issues with VM Templates

The problem lies in the lack of specific optimization for each app.

In the previous optimization of the Android 2.2 virtual machine, it was mentioned that Google used JIT to record commonly used functions (hot code) and compile these contents immediately when the user opened the app, prioritizing the compilation of hot code.

However, in the hybrid compilation era of Android 7.0, due to the existence of AOT, this function was weakened, and the hot code recorded by JIT was not persistent. The priority of AOT compilation follows the VM template, and AOT compiles some bytecode into machine code based on the template content.

This gives rise to a problem.

For example, a restaurant’s signature dish is tomato scrambled eggs, so there will be plenty of prepared ingredients for that dish. However, customer A prefers to order a rare beef steak, which means they will have to wait for the steak to be prepared.

If the hot code of an app (like the homepage) happens to be outside the VM template, then AOT is essentially ineffective. (For example, if the VM template prioritizes compiling classes and methods with names not exceeding 15 characters, but the class name of the homepage happens to exceed 15 characters. This is just an example and has not been practically verified.)

Below is an example of the homepage and settings page: if AOT does not prioritize compiling the homepage code for some reason and instead compiles the less important settings page code:

The process illustrated above shows that in special cases, AOT compilation is essentially ineffective, relying entirely on the interpreter and JIT for real-time compilation, reverting the entire compilation scheme to the level of Android 2.2.

③. Smart ART

Although this problem exists, it is not particularly severe. Because ART is not as foolish as I described; during the usage of the application, ART will record and learn user habits (saving hot code) and then update the customized VM template for the current app, continuously supplementing the hot code and customizing the template.

Doesn’t this sound familiar? The slogan from the mobile launch conference about