Unlocking the Door to GCC

Unlocking the Door to GCC

START

Hi, everyone! I am Mu Rong.

Today, I will introduce you to GCC.

First, let me clarify: the uppercase GCC and lowercase gcc are not the same thing. Let’s take it slow…

1. What is GCC?

The GNU Compiler Collection (GCC) is a compiler toolchain developed by the GNU project that supports various programming languages, hardware architectures, and operating systems. The Free Software Foundation (FSF) releases GCC as free software under the GNU General Public License (GNU GPL). GCC is a key component of the GNU toolchain and is the standard compiler for most projects related to GNU and the Linux kernel. As of 2019, GCC has about 15 million lines of code, indicating its extensive functionality, making it one of the largest free software programs in existence. As a tool, it has played an important role in the development of free software.

In its early days, GCC stood for GNU C Compiler, and it was initially designed solely for compiling C language code. However, with continuous iterations, the functionality of GCC has greatly expanded, and it can now compile programs written in various languages, including C, C++, Objective-C, Java, Fortran, Pascal, Modula-3, and Ada.

GCC official website: https://gcc.gnu.org, the latest version is currently: GCC 12.2 (changes)

Unlocking the Door to GCC

Features

  • GCC is a portable compiler that supports various hardware platforms, such as ARM and X86.
  • It can also perform cross-platform compilation. A local compiler only allows the compiled program to run in the local environment, whereas a program compiled with GCC can run on other platforms. For example, an embedded program can be compiled on x86 and then run on ARM.
  • GCC supports multiple languages for parsing different languages.
  • It is designed in a modular fashion, allowing for the addition of support for new languages and new CPU architectures.
  • GCC is free software, meaning anyone can use or modify it.

This article focuses on important knowledge points about GCC commonly used in Linux.

GCC vs gcc vs g++

  • GCC refers to the GNU Compiler Collection, which includes compilers, linkers, assemblers, etc. It is primarily used to compile C and C++ languages, but can also compile Objective-C and Objective-C++ programs.
  • gcc (GNU C Compiler) represents the GNU C language compiler; g++ represents the GNU C++ language compiler. However, fundamentally, gcc and g++ are not true compilers; they are merely two tools within GCC that call the actual compiler to compile the code when compiling C/C++ programs. You can understand it simply: gcc calls the C compiler, while g++ calls the C++ compiler.

In practice, we usually use the gcc command to compile C programs and the g++ command to compile C++ code. It is worth noting that the gcc command can also compile C++ programs, and similarly, the g++ command can be used to compile C programs.

Differences between gcc and g++

  1. gcc treats .c files as C programs and .cpp files as C++ programs.
  2. g++ treats both .c and .cpp files as C++ programs.
  3. Linking method: gcc does not automatically link C++ libraries (like the STL standard library), whereas g++ automatically links C++ libraries.
  4. Preprocessor macros: g++ automatically adds some preprocessor macros, such as __cplusplus, while gcc does not.

gcc is the general compilation command for the GCC compiler, and based on the file extension, the gcc command can determine the programming language used:

  • .c: Compiles as a C language program by default.
  • .cpp: Compiles as a C++ program by default.
  • .m: Compiles as an Objective-C program by default.
  • .go: Compiles as a Go language program by default.

Of course, we can also specify manually: -x

  • gcc -xc file.c indicates compiling file.c as C language code.
  • gcc -xc++ file.cpp indicates compiling file.cpp as C++ code.

There are more detailed differences between the gcc and g++ commands, and when compiling programs, we often adhere to the following principles:

  • For compiling C language programs, we should use the gcc command.
  • For compiling C++ programs, it is recommended to use the g++ command.
  • For developing pure C language programs, gcc can be used; if it is a mixed C/C++ programming, it is advisable to use g++.

2. GCC Compilation

We know that computers only recognize machine language, and the source code we write must be translated into machine language to be recognized by the computer, which means generating executable files that the computer can recognize. How does GCC compile our source code into the final executable file?

  1. Source File: This is the source code file we have written. A source file is essentially a plain text file that does not have a special format inside. We can indicate the language of the code saved in the file through the file extension, which makes it easier for programmers to distinguish and for the compiler to recognize; it does not change the internal format of the file.

  2. Compile: Programming language code is a language understood by programmers, but machines only recognize machine language, meaning computers only understand binary instructions. Therefore, a tool is needed to convert programming language code into binary instructions that the computer can recognize. This tool is a special software called a compiler. The compiler can recognize sentences in the code and various specific formats and convert them into binary form that the computer can recognize. This process is called compiling. Compiling can also be understood as “translating,” similar to translating English into Chinese; it is a complex process, and we do not need to concern ourselves with the complexities of the compiler’s internal processes during compilation.

  3. Link: After compiling, C language code does not generate the final executable file; instead, it generates an intermediate file called an object file. The object file is also in binary form and has the same format as the executable file. For Visual C++, the object file extension is .obj; for GCC, it is .o. The object file must be linked to become an executable file. Since the object file and executable file have the same format, why not directly use the object file as the executable? Because compiling only converts our written code into binary form; it still needs to combine with system components (such as standard libraries, dynamic link libraries, etc.) that are necessary for the program to run. Linking is essentially a “packaging” process that combines all binary object files and system components into an executable file. Completing the linking process also requires special software called a linker.

Supported file extensions by GCC

File Extension Type
.c C language source code file
.a Archive library file composed of object files
.C/.cc/.cxx C++ source code file
.h Header files included in the program
.i Preprocessed C source code file
.ii Preprocessed C++ source code file
.m Objective-C source code file
.o Compiled object file
.s Assembly language source code file
.S Precompiled assembly language source code file

The GCC compilation process can be divided into four stages: preprocessing, compiling, assembling, and linking.

  • Preprocessing: This mainly handles commands starting with “#” and generates .i/.ii files.
  • Compiling: In this stage, GCC first checks the code for syntax errors, etc., to determine the actual work the code needs to do. After confirming there are no issues, GCC translates the code into assembly language.
  • Assembling: This stage translates *.s files into binary machine instruction files *.o, meaning it converts assembly code into commands that machines can execute.
  • Linking: This stage assembles the generated object files (.o files) into an executable file.

Let’s analyze the entire execution process of GCC through a simple example:

#include <stdio.h>

int main(int argc, char const *argv[])
{
    /* code */
    printf("hello GCC\n");

    return 0;
}

1. Preprocessing Stage

gcc -E main.c -o main.i

Unlocking the Door to GCC

During the preprocessing stage, the precompiled directives starting with # in the source code are processed, handling #include and #define, inserting the .h files included by #include into the location of the #include directive, and replacing the macros defined by #define in the source program with their actual strings.

Mainly handling includes:

  • Removing all #define macro definitions and expanding all macro definitions.
  • Processing all conditional compilation directives, such as #if, #ifdef, etc.
  • Processing #include precompiled directives, inserting the included files into the location of the precompiled directive. This process is recursive, meaning that included files may also include other files.
  • Removing all comments // and /**/.
  • Adding line numbers and file identifiers, such as #2 “main.c” 2, to facilitate the compiler in generating line number information for debugging and displaying line number information when generating compilation errors or warnings.
  • Retaining all #pragma compiler directives, as the compiler needs to use them.

2. Compiling Stage

gcc -S main.i -o main.s

Unlocking the Door to GCC

During the compiling stage, GCC first checks the code for conformity and syntax errors to determine what the code is supposed to do. If everything checks out, GCC translates the code into assembly language.

Let’s take a look at the contents of the generated assembly file main.s:

Unlocking the Door to GCC

3. Assembling Stage

gcc -c main.s -o main.o

Unlocking the Door to GCC

The assembling stage translates *.s files into binary machine instruction files *.o, meaning it converts assembly code into commands that machines can execute.

4. Linking Stage

gcc main.o -o main

Unlocking the Door to GCC

The linker ld assembles the various object files together, resolves symbol dependencies and library dependencies, and generates the executable file. In this example, it is simply said that in this program, the implementation of the function "printf" is not defined, and in the precompiled "stdio.h" included, there is only the declaration of that function, but not the definition. So where is the implementation of the "printf" function? The answer is that the implementation of these functions is included in a library file named libc.so.6. By default, GCC will search in the system's default library paths, such as "/usr/lib", and link to the libc.so.6 library function, thus enabling the "printf" function.

  • By default, GCC prefers to use dynamic libraries during linking, only considering static libraries if dynamic libraries are not available. Because dynamic libraries save space, the default linking operation in Linux is to connect to dynamic libraries first.

The general locations of header files or library files are:

/usr/include and its subdirectory include folders

/usr/local/include and its subdirectory include folders

/usr/lib

/usr/local/lib

/lib

Static library linking search path order:

  1. ld will look for parameters specified in the GCC command -L
  2. Then it looks for the environment variable LIBRARY_PATH
  3. Finally, it checks the default directories /lib /usr/lib /usr/local/lib

Dynamic linking execution search path order:

  1. The dynamic library search path specified when compiling the target code.
  2. The dynamic library search path specified by the environment variable LD_LIBRARY_PATH.
  3. The dynamic library search path specified in the configuration file /etc/ld.so.conf.
  4. The default dynamic library search path /lib.
  5. The default dynamic library search path /usr/lib.

The library search path follows several principles: search from left to right for directories specified by -I -l, if not found, GCC will search from directories specified by environment variables. The environment variable for header files is C_INCLUDE_PATH, and the environment variable for libraries is LIBRARY_PATH. If still not found, it will search from system-specified directories.

Related environment variables: LIBRARY_PATH environment variable: specifies the search path for static library files for programs. LD_LIBRARY_PATH environment variable: specifies the search path for dynamic library files for programs.

After the linking stage, we have generated the executable program we need, and we can now execute the executable program.

GCC Parameter Description

GCC has many compilation parameters, here are the commonly used parameter descriptions:

  • -C

During preprocessing, do not delete comment information, generally used with -E, sometimes useful for analyzing programs.

  • -M

Generate file dependency information. It includes all source code dependencies for the target file. You can test it with gcc -M hello.c, it’s very simple.

  • -MM

Similar to the above, but it ignores dependencies caused by #include<file>.

  • -MD

Similar to -M, but the output will be imported into a .d file.

  • -MMD

Similar to -MM, but the output will be imported into a .d file.

  • -Wa,option

This option passes option to the assembler; if there are commas in option, it splits the option into multiple options and then passes them to the assembler.

  • -Wl.option

This option passes option to the linker; if there are commas in option, it splits the option into multiple options and then passes them to the linker.

  • -llibrary

Specifies the library to be used during compilation.

Example usage:

gcc -lcurses hello.c uses the ncurses library to compile the program.

  • -Ldir

Specifies the search path for libraries during compilation. For your own libraries, you can specify the directory; otherwise, the compiler will only look in the standard library directories. This dir is the name of the directory.

  • -O0, -O1, -O2, -O3

Four levels of compiler optimization options, where -O0 means no optimization, -O1 is the default value, and -O3 is the highest optimization level.

  • -g

Just the compiler, generates debugging information during compilation.

  • -gstabs

This option generates debugging information in stabs format but does not include gdb debugging information.

  • -gstabs+

This option generates debugging information in stabs format and includes additional debugging information for gdb.

  • -ggdb

This option generates debugging information that can be used by gdb as much as possible.

  • -static

This option disables the use of dynamic libraries, so the compiled output is generally large and does not require any dynamic linking libraries to run.

  • -share

This option tries to use dynamic libraries, resulting in smaller generated files, but requires the system to have dynamic libraries.

  • -x language filename

Sets the language used by the file, making the extension invalid and applying to multiple files afterwards. By convention, the C language file extension is .c, while the C++ file extension is .C or .cpp. If you want to use a custom extension like .pig for your C code, you need to use this parameter, which applies to all subsequent file names unless another option is used. Available parameters include: ‘c’, ‘objective-c’, ‘c-header’, ‘c++’, ‘cpp-output’, ‘assembler’, and ‘assembler-with-cpp’.

Example usage: gcc -x c hello.pig

  • -x none filename

Disables the previous option, allowing gcc to automatically recognize the file type based on the file extension.

Example usage: gcc -x c hello.pig -x none hello2.c

  • -c

Only activates preprocessing, compiling, and assembling, meaning it only generates obj files.

Example usage:

gcc -c hello.c generates an .o obj file.

  • -S

Only activates preprocessing and compiling, meaning it compiles the file into assembly code.

Example usage:

gcc -S hello.c generates an .s assembly code file that you can view with a text editor.

  • -E

Only activates preprocessing, does not generate a file; you need to redirect it to an output file.

Example usage:

gcc -E hello.c > pianoapan.txtgcc -E hello.c | more Take your time; processing a hello world can take 800 lines of code.

  • -o

Specifies the target name; by default, the file generated by gcc is a.out, which sounds unappealing. If you agree, change it!

Example usage:

gcc -o hello.exe hello.c gcc -o hello.asm -S hello.c

  • -pipe

Uses pipes instead of temporary files during compilation. There may be issues when using non-gnu assembly tools.

gcc -pipe -o hello.exe hello.c

  • -ansi

Disables features in GNU C that are incompatible with ANSI C and activates proprietary features of ANSI C (including prohibiting certain asm inline typeof keywords and UNIX, vax, etc. preprocessor macros).

  • -fno-asm

This option implements part of the ANSI option functionality, prohibiting the use of asm, inline, and typeof as keywords.

  • -fno-strict-prototype

This only affects g++; using this option, g++ treats functions without parameters as not having explicit parameter count and type specifications, rather than having no parameters. In contrast, gcc considers functions without parameters as not having explicit type specifications, regardless of this option.

  • -fthis-is-variable

This aligns with traditional C++, allowing the use of this as a general variable.

  • -fcond-mismatch

This allows the types of the second and third parameters in conditional expressions to mismatch, with the expression’s value being of void type.

  • -funsigned-char, -fno-signed-char, -fsigned-char, -fno-unsigned-char

These four parameters set the char type, determining whether to set the char type as unsigned char (first two parameters) or signed char (last two parameters).

  • -include file includes certain code; in simple terms, it allows you to specify a file that needs another file, similar to using #include<filename> in the code.

Example usage:

gcc hello.c -include /root/pianopan.h -imacros file

Expands the macros from the file into the input file of gcc/g++, with the macro definitions themselves not appearing in the input file.

  • -Dmacro

Equivalent to #define macro in C language.

  • -Dmacro=defn

Equivalent to #define macro=defn in C language.

  • -Umacro

Equivalent to #undef macro in C language.

  • -undef

Removes the definition of any non-standard macros.

  • -Idir

When using #include "file", gcc/g++ will first search for the specified header file in the current directory; if not found, it will search in the default header file directories. If the -I option is used to specify a directory, it will first search in the specified directory, then follow the usual order.

For #include<file>, gcc/g++ will search in the directories specified by -I; if not found, it will then search in the system’s default header file directories.

  • -I-

This cancels the effect of the previous parameter, so it is generally used after -Idir.

  • -idirafter dir

If search fails in the -I directory, it will search in this directory.

  • -iprefix prefix, -iwithprefix dir

Usually used together; if search fails in the -I directory, it will search in prefix+dir.

  • -nostdinc

This prevents the compiler from searching for header files in the system’s default header file directories, usually used with -I to explicitly specify header file locations.

  • -nostdin C++

This specifies not to search in the standard path specified for g++, but still searches in other paths; this option is used when creating the libg++ library.

  • -traditional

This attempts to make the compiler support traditional C language features.

Conclusion

Alright, friends, that’s it for now. I hope this article helps you. If you liked it, remember to like and share. Follow Linux Armory for ongoing sharing of Linux knowledge. Since this article was produced in my spare time, there may be shortcomings; I welcome any corrections (some content references from the internet).

Unlocking the Door to GCC

END

Leave a Comment