The Compilation Principles of LLVM in Keil Compiler AC6

Follow+Star Public Account, don’t miss the wonderful content

Author | strongerHuang

WeChat Public Account | Embedded Column

In Keil MDK, the Arm Compiler (Arm Compiler) is used, mainly AC5 and AC6.

AC6 has a significant improvement in compilation speed compared to AC5, do you know why?

The reason is that AC6 is built on modern LLVM and Clang technology. Today we will discuss the compilation principles of AC6 and LLVM.

Embedded Column

About Compiler AC6

As early as 2015, Arm launched the AC6 Compiler, which is integrated into Keil MDK. However, due to the incompatibility of code between AC5 and AC6, many users of Keil MDK still use the AC5 compiler.

So, despite the incompatibility, AC6’s compilation speed is much faster than AC5, which still attracts some users to use the AC6 compiler.

Why is the compilation speed of AC6 much faster? The reason lies in the fact that AC6 uses a completely new architectural technology.

The AC6 components are as follows:

1.armclang

Built on modern LLVM and Clang technology
Supports GNU syntax assembly
Highly compatible with the source code originally written for GCC
Specification implementation, including ANSI / ISO C and C++, ABI for Arm architecture, ABI for 64-bit Arm architecture, and Arm C Language Extensions (ACLE)

2.armlink

A feature-rich dedicated embedded linker that combines objects and libraries to generate executables

3.Arm C Library:Optimized by Arm for performance and code density, including a micro MicroLib for deeply embedded applications.

4.Arm C++ Library:A library based on the LLVM libc++ project

This part can refer to:

https://developer.arm.com/tools-and-software/embedded/arm-compiler

The optimization options of Keil MDK Compiler AC5 and AC6 have significant differences: ARM Compiler 5 (and earlier versions) use the armcc compiler, while ARM Compiler 6 replaces armcc with armclang, which is based on LLVM and has different command line parameters, instructions, etc., thus considered a new compiler.

Here, we recommend reading:

A Step-by-Step Guide to Upgrading Keil MDK Compiler from V5 to V6

Step-by-Step Guide to Upgrading the ARM Compiler in Keil MDK

Step-by-Step Guide to Using GCC Compiler Toolchain in Keil MDK

The Differences in Browsing Information Generated by AC5 and AC6 in Keil MDK

Embedded Column

Basic Content of LLVM

We mentioned that AC6 is based on LLVM, now let’s talk about LLVM.

1.What is LLVM?

LLVM is a framework system for constructing compilers, written in C++, used to optimize the compile-time, link-time, run-time, and idle-time of programs written in any programming language, keeping it open to developers and compatible with existing scripts.

When understanding LLVM, we can think of it as including a narrow sense of LLVM and a broad sense of LLVM. The broad sense of LLVM actually refers to the entire LLVM compiler architecture, including front-end, back-end, optimizer, numerous library functions, and many modules; while the narrow sense of LLVM actually focuses on a series of modules and libraries for the back-end functionality of the compiler (code generation, code optimization, JIT, etc.).

2.Advantages of LLVM

Traditional compilers are divided into three stages:

Front-end (Frontend) — Optimizer (Optimizer) — Back-end (Backend)

The front-end is responsible for analyzing the source code, checking for syntax-level errors, and constructing an abstract syntax tree (AST) specific to the language; the AST can be further transformed into optimized code, ultimately converted into a new representation, and then handed over to the optimizer and back-end for processing;

Finally, the back-end generates executable machine code.

The Compilation Principles of LLVM in Keil Compiler AC6

LLVM is also divided into three stages, but with slight design differences. What is different about LLVM is that it provides the same intermediate representation for different languages:

The front-end can use different compilation tools to perform lexical analysis on code files to form the abstract syntax tree AST, and then convert the analyzed code into LLVM’s intermediate representation IR (intermediate representation); the optimizer in the middle only operates on the intermediate representation IR, optimizing it through a series of passes; the back-end is responsible for interpreting the optimized IR into machine code for the corresponding platform. The advantage of LLVM is that the intermediate representation IR is well-written, and different front-end languages ultimately convert into the same IR.

Why use a three-stage design? What are the advantages? First, it solves a big problem: if there are N front-end languages (C, OC, C++, Swift…) and M architectures (simulators, arm64, x86…), do we need N*M compilers? The value of the three-stage architecture is reflected here, effectively solving this problem through shared optimization in-between.

Embedded Column

Relationship Between Clang and LLVM

Clang is a C++-written, LLVM-based compiler for C/C++/Objective-C/Objective-C++ released under the LLVM BSD license. So why develop Clang when GCC already exists? What advantages does Clang have over GCC?

In fact, this was a major consideration when Clang was initially designed and developed. Clang is a highly modularized lightweight compiler that compiles quickly, occupies little memory, and is very convenient for secondary development.

What is the relationship between LVM and Clang? We can correspond them to several independent parts in traditional compilers, which can express the relationships more clearly and vividly.

In fact, corresponding to this diagram, we can clearly identify their corresponding relationships. LLVM and Clang are the C/C++ compiler suite. For the entire LLVM framework, it includes Clang, as Clang is part of the LLVM framework and serves as its C/C++ front-end. Clang utilizes some functionalities within LLVM, currently known to include optimizations for intermediate format code, and perhaps also some code generation functionalities.

From the perspective of source code, clang is a tool based on LLVM. From a functional perspective, LLVM can be considered the back-end of a compiler, while clang serves as the front-end of a compiler, their relationship becomes clearer; a compiler front-end requires the introduction of the compiler back-end to turn the program into an executable file.

Embedded Column

LLVM Compilation Toolchain Process

The following is a simple diagram of compiling a language using LLVM:

The process of LLVM compiling a source file: Preprocessing -> Lexical Analysis -> Token -> Syntax Analysis -> AST -> Code Generation -> LLVM IR -> Optimization -> Generate Assembly Code -> Link -> Object File.

It is completely necessary for us to manually do the lexical analysis from source code to token, or rely on other tools like lex, yacc; the output of lexical analysis is to parse the source code into tokens. These tokens are small units with types and values, such as keywords, numbers, or identifiers. From AST to LLVM, LLVM begins to provide a series of tools to help us develop quickly. From IR (Intermediate Representation) to DAG (Directed Acyclic Graph) to machine instructions, LLVM has a complete back-end for common platforms. In other words, once we complete the step to IR, we enjoy the same advanced productivity as Clang for the subsequent work.

CodeGen is responsible for traversing the syntax tree from top to bottom, translating it into LLVM IR, which is the output of the Frontend and the input of the LLVM Backend, bridging the front and back ends.

LLVM Commands:

You can use llc to convert LLVM bytecode into platform-specific assembly code
lli can perform this work through an interpreter or using the JIT compiler in the high-level options
llvm-gcc is a modified version of the GNU Compiler Collection (gcc) that generates LLVM bytecode when run with the -S -emit-llvm option.

Compilation Instructions:

clang -c -emit-llvm test1.c -o test1.bc generates bytecode

clang -S -emit-llvm test.c -o test.ll generates visual bytecode

llvm-dis test1.bc test1.ll converts bytecode bc to visual bytecode ll

llvm-as test1.ll test1.bc converts visual bytecode to bytecode bc

———— END ————

Reply with ‘ Keil Series Tutorials ‘ and ‘ Compiler ‘ to read more related articles.

Welcome to follow my public account, reply “Join Group” to join the technical exchange group according to the rules, reply “1024” to see more content.

Welcome to follow my video account:

Click “Read the Original” to see more shares, and feel free to share, bookmark, like, and view.

Related posts

Leave a Comment Cancel reply