Introduction
A few years ago, I wrote an OS that ran on STM32 and other Cortex-M series processors. Since I wasn’t familiar with the Cortex-A series processors, I hadn’t ported the OS to them. Later, I learned some knowledge about the ARMv7A instruction set and ported my OS to the ZYNQ7020 chip, which helped me understand the process of porting an OS to ARMv7A architecture. Subsequently, I ported lwIP and FATFS to the OS, enriching its functionality.
Recently, after studying some knowledge about the ARMv8A instruction set, I considered porting the OS to ARMv8A architecture processors. Currently, the system can run to the first task normally, but it crashes as soon as the MMU is enabled. Since the system crashes immediately after enabling the MMU, it is difficult to print specific error messages, making this issue quite challenging to troubleshoot.
Below I will briefly document the process of resolving these issues, providing a troubleshooting direction for those encountering similar problems.
Issue 1: System Crashes When Enabling MMU
Since the OS crashes immediately after enabling the MMU, I cannot determine the real cause of the crash by printing critical registers. I decided to take a different approach to troubleshoot this issue. My MMU code works normally in bare metal, so I only need to identify the differences between the bare metal and the OS. I mainly focused on three areas:
-
Changing the bare metal cross-compiler; -
Comparing the OS’s startup configuration with that of the bare metal; -
Comparing the Makefile configurations;
Changing the Bare Metal Cross-Compiler
After changing the bare metal cross-compiler to the one used by the OS and compiling the bare metal code, I ran the bare metal image. After enabling the MMU, the subsequent functions of the bare metal worked normally, temporarily ruling out the impact of inconsistent cross-compilers.
Comparing OS Startup Configuration and Bare Metal Startup Configuration
By comparing the critical register configurations between the assembly entry code and the MMU enable call, I mainly compared the values of the following registers during startup for both the OS and bare metal:
-
SCR_EL3
; -
SPSR_EL3
; -
HCR_EL2
; -
SPSR_EL2
; -
SCTLR_EL1
;
I configured the values of the above registers in the bare metal to match those of the OS, compiled the bare metal code, and ran the bare metal image. After enabling the MMU, the subsequent functions of the bare metal worked normally, temporarily ruling out the impact of inconsistent startup configurations.
Comparing Makefile Configurations
After changing the bare metal cross-compiler and comparing some register configurations during startup, the bare metal image did not exhibit the crash issue after enabling the MMU. At this point, I didn’t have many ideas left, thinking that this problem had nothing to do with the code or the compiler. What else could cause this issue? I considered that I had not compared the Makefile and the linker script, so I compared the OS and bare metal linker scripts, finding no significant differences. Therefore, I focused on comparing the Makefile configurations.
By comparing the Makefile, I discovered that as long as the bare metal Makefile included the -ffunction-sections
compilation option, and then I recompiled the bare metal code, running the bare metal image would lead to a crash after enabling the MMU.
The linker script for the bare metal describes the code section as follows:
/* text.boot code section */
_text_boot = .;
.text.boot : { *(.text.boot) }
_etext_boot = .;
/*
* text code section
*/
_text = .;
.text :
{
*(.text)
}
_etext = .;
According to the linker script rules, all code sections should be between _text_boot
and _etext
. When the -ffunction-sections
compilation option is not added, all code sections indeed are between _text_boot
and _etext
, with _etext
followed by the data section, as shown in the following image:
According to the map file, all functions are in the .text
and .text.boot
sections.
After adding the -ffunction-sections
compilation option and recompiling the bare metal code, the analysis of the map file revealed that _etext
was not the end of the code section, as shown in the following image:
_etext
was followed by code sections named after various functions, including the function that enables the MMU. However, when configuring the MMU, the range of the code sections was set between _text_boot
and _etext
. Since the function that enables the MMU was located after _etext
, enabling the MMU caused the program to crash.
Solving this problem is relatively simple; there is no need to remove the -ffunction-sections
compilation option. I just need to modify the bare metal linker script to ensure that all code sections are between _text_boot
and _etext
. The modified bare metal linker script for the code sections is as follows:
/* text.boot code section */
_text_boot = .;
.text.boot : { *(.text.boot) }
_etext_boot = .;
/*
* text code section
*/
_text = .;
.text :
{
/* Ensure all functions are in the code section */
*(.text*)
}
_etext = .;
Now, recompiling the bare metal code ensures that all code sections are between _text_boot
and _etext
, and the program runs normally after enabling the MMU.
After modifying the linker script in the OS, the system ran normally, which was a relief.
The root cause of this problem was indeed a defect in the linker script regarding the range of code sections, causing functions that should have been in the code section to be placed in the data section.
Issue 2: System Crashes When Executing a Certain Function
After resolving Issue 1, I expected the OS to run some tests normally. However, I encountered a system crash when calling a specific function. The phenomenon was similar to Issue 1, but since the MMU was functioning correctly at this point, the OS’s exception message provided some hints: it indicated an instruction exception, suggesting that there might still be a bug in the MMU configuration. I printed the MMU configuration ranges, which included three regions: code section, data section, and device address space section:
va:0x00080000 pa:0x00080000 size:0x00007000 // Code section
va:0x00086000 pa:0x00086000 size:0x1ff7a000 // Data section
va:0xfe000000 pa:0xfe000000 size:0x02000000 // Device address space
It can be seen that the code section from 0x86000 to 0x87000 was reconfigured as the data section. When calling functions located in this range, the system would crash. This issue was caused by the end of the code section not being 4KB aligned, while the data section was configured to start at the end of the code section, leading to it being classified as a data section during MMU configuration. An analysis of the map file confirmed that the armv8a_int_enable()
function, located in the code section from 0x86000 to 0x87000, caused the exception:
There are two methods to solve this issue (you can choose either one).
Method 1
Modify the linker script to align the end of the code section to 4KB, ensuring that the starting address of the data section does not overlap with the end of the code section during MMU configuration.
The modified linker script for the code section is as follows:
/* text.boot code section */
_text_boot = .;
.text.boot : { *(.text.boot) }
_etext_boot = .;
/*
* text code section
*/
_text = .;
.text :
{
/* Ensure all functions are in the code section */
*(.text*)
}
. = ALIGN(4096);
_etext = .;
After modifying the linker script, the OS finally ran normally, and the corresponding map file was as follows:
_etext
is 0x87000, which is 4KB aligned.
Method 2
Align the starting address of the data section in the linker script to 4KB, configuring the MMU using the starting address of the data section as a parameter.
Conclusion
Introduction to -ffunction-sections Compilation Option
I checked the description of the -ffunction-sections
compilation option: The linking operation of the GCC linker treats sections as the smallest processing unit. As long as a symbol in a section is referenced, that section will be linked into the executable program. When compiling code with GCC, you can use -ffunction-sections
and -fdata-sections
to generate each function or symbol as a section, with each section name matching the function or data name. During the linking phase, the -Wl,--gc-sections
linking option instructs the linker to remove unused sections, reducing the size of the final executable program.
We can use the following compile and link options to enable section optimization:
C_FLAGS += -ffunction-sections -fdata-sections
LD_FLAGS += -Wl,--gc-sections
If there are any issues in the article, please feel free to point them out, as my level is limited.
If this article has helped you, remember to like and follow the blogger.
Note: Please indicate the source when reprinting, thank you!