Often, we see Arm Linux enthusiasts discussing whether the Arm32 Linux kernel uses two sets of page tables, TTBR0 and TTBR1. I hope this article can clarify this issue, and I welcome discussions.
On the Fuxing train heading to Beijing, I finally have some time to write this article (thanks to the Wi-Fi on the high-speed train), another reason to abandon the airplane.
Our viewpoint
This issue is somewhat complex and needs to be discussed in different situations.
First, let’s state the conclusion:
1. If it is a Cortex-A9/5 CPU that does not support LPAE (Large Physical Address Extension), or those CPUs that support LPAE but the kernel does not enable LPAE (CONFIG_ARM_LPAE is not enabled), then only one TTBR0 is used, and TTBR1 is not used.
2. For CPUs that support LPAE, such as Cortex-A7/A15/A17, if LPAE is enabled in the kernel (CONFIG_ARM_LPAE is enabled), both TTBR0 and TTBR1 are used.
This is why two page tables are used.
Linux’s address space is divided into user and kernel space.
The virtual address of the user address space corresponds to the physical address and changes with different applications, but the virtual address and physical address correspondence in the kernel space is constant and global.
The division between user space and kernel space is determined by the kernel configuration option CONFIG_PAGE_OFFSET. It can be configured to 3G/1G or 2G/2G.
Every time a new process is created, the kernel copies the memory map of the parent process, including the kernel mapping; the process is as follows:
do_fork–> copy_process –> copy_mm –> dup_mm –> mm_init –> mm_alloc_pgd –> pgd_alloc –>
/*
* Copy over the kernel and IO PGD entries
*/
init_pgd = pgd_offset_k(0);
memcpy(new_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,
(PTRS_PER_PGD – USER_PTRS_PER_PGD) * sizeof(pgd_t));
The kernel mapping is stored in swapper_pg_dir, which is a global page table that will be copied to each process’s page table during the above process. You might wonder if copying the entire kernel page table is a large workload? And once the kernel mapping changes, does every process’s kernel mapping need to be updated?
The actual situation is that the kernel mapping of the process does not need to be copied entirely; only the first-level top page table entry needs to be copied.
Returning to the topic of TTBR0/TTBR1, in the Arm architecture, v7-A introduced TTBR0/TTBR1. The basic idea is to separate the user and kernel mappings; MMU TTBR0 points to the user page table, while TTBR1 points to the kernel page table. The MMU hardware automatically selects the page table pointed to by TTBR0 or TTBR1 for virtual to physical address translation based on the input virtual address.
In CPUs without LPAE, or with LPAE hardware support but the software not enabled, the division of user and kernel space is determined by MMU TTBCR.N as shown in the table below:
TTBCR.N |
TTBR1 Page Table Coverage Address Start |
0b000 |
TTBR1 not used |
0b001 |
0x80000000 |
0b010 |
0x40000000 |
0b011 |
0x20000000 |
0b100 |
0x10000000 |
0b101 |
0x08000000 |
0b110 |
0x04000000 |
0b111 |
0x02000000 |
From this, we can see that the address that can be divided can be TTBCR.N can divide out:
User/Kernel: 1G/3G, 2G/2G, or not divided.
It can be seen that it cannot divide out User/Kernel: 3G/1G. So, in fact, if the kernel is configured to a 2G/2G division, it can use TTBR0/TTBR1, but doing so will lead to the need for different handling for using 3G/1G and 2G/2G kernels, which will bring development and maintenance costs.
Therefore, the Arm Linux kernel ultimately chooses to only use TTBR0. This is also what Russell King said: Arm Linux does not use TTBR1.
http://hackers4hackers.blogspot.com/2014/02/arm-linux-do-not-use-ttbr1.html
This is considered a mistake of the Arm architecture not considering the Linux usage scenario, as before Android and Apple, the main OS for Arm mobile was Symbian and WinCE.
Therefore, when Arm added the LPAE extension, some modifications were made:
1. Allow more flexible division of address space.
2. The start address and size of the upper and lower address spaces can be set separately.
This is set by TTBCR.T0SZ and TTBCR.T1SZ.
TTBCR |
Coverage Address Range |
||
T0SZ |
T1SZ |
TTBR0 |
TTBR1 |
0b000 |
0b000 |
All addresses |
Not used |
M |
0b000 |
0 to 2^(32-M)-1 |
2^(32-M) to maximum address |
0b000 |
N |
0 to (2^32-2^(32-N)-1) |
(2^32-2^(32-N)) to maximum address |
M |
N |
0 to (2^(32-M)-1) |
2^32-2^(32-N) to maximum address |
On CPUs with LPAE, and the software enables this function, the following address segmentation methods can be used:
So now we can divide out a 3G/1G User/Kernel address space.
Code Analysis
In head.s, after enabling the MMU, it jumps to __v7_proc_info->__cpu_flush for execution.
In proc-v7.S, some member variables are defined using the __v7_proc macro in __v7_proc_info, where __cpu_flush=__v7_setup.__v7_setup is defined in proc-v7.S as follows:
#ifdef CONFIG_ARM_LPAE
mov r5, #0 @ highTTBR0
mov r8, r4, lsr#12 @TTBR1 is swapper_pg_dir pfn
#else
mov r8, r4 @ setTTBR1 to swapper_pg_dir
#endif
ldr r12, [r10,#PROCINFO_INITFUNC]
add r12, r12,r10 //r12->__v7_setup
ret r12 //call __v7_setup
1: b __enable_mmu
In __v7_setup, TTBCR and TTBRx will be set.
proc-v7.s
mcr p15, 0, r10,c8, c7, 0 @invalidate I + D TLBs
v7_ttb_setup r10, r4, r5, r8, r3 @ TTBCR, TTBRx setup
ldr r3, =PRRR @ PRRR
ldr r6, =NMRR @ NMRR
mcr p15, 0, r3,c10, c2, 0 @ writePRRR
mcr p15, 0, r6,c10, c2, 1 @ writeNMRR
According to whether LPAE is enabled or not, __v7_setup can implement:
1. When LPAE is not enabled, it is implemented in Proc-v7-2level.s.
2. When LPAE is enabled, it is implemented in Proc-v7-3level.s.
In the case of not using LPAE
proc-v7-2level.s
/*
* Macro for setting up the TTBRx and TTBCR registers.
* – tb0 and tb1 updated with the corresponding flags.
*/
.macro v7_ttb_setup,zero, ttbr0l, ttbr0h, ttbr1, tmp
mcr p15, 0, tb0, c2, c0, 2 @ TTB control register, set TTBCR=0, so only use TTBR0
ALT_SMP(orr tbr0l, tbr0l, #TTB_FLAGS_SMP)
ALT_UP(orr tbr0l, tbr0l, #TTB_FLAGS_UP)
ALT_SMP(orr tbr1, tbr1, #TTB_FLAGS_SMP)
ALT_UP(orr tbr1, tbr1, #TTB_FLAGS_UP)
mcr p15, 0, tbr1, c2, c0, 1 @load TTB1
.endm
In the case of using LPAE
/*
* TTBR0/TTBR1 split (PAGE_OFFSET):
* 0x40000000:T0SZ = 2, T1SZ = 0 (not used)
* 0x80000000: T0SZ = 0, T1SZ = 1
* 0xc0000000: T0SZ = 0, T1SZ = 2
*
* Only use this feature if PHYS_OFFSET <= PAGE_OFFSET, otherwise
* booting secondary CPUs would end up using TTBR1 for the identity
* mapping set up in TTBR0.
*/
#if definedCONFIG_VMSPLIT_2G
#defineTTBR1_OFFSET 16 /* skip two L1 entries */
#elif defined CONFIG_VMSPLIT_3G
#defineTTBR1_OFFSET (4096 * (1 + 3)) /* only L2, skip pgd + 3*pmd */
#else
#defineTTBR1_OFFSET 0
#endif
#defineTTBR1_SIZE (((PAGE_OFFSET>> 30) – 1) << 16)
proc-v7-3level.s
.macro v7_ttb_setup,zero, ttbr0l, ttbr0h, ttbr1, tmp
ldr mp,=swapper_pg_dir @swapper_pg_dir virtual address
cmp tbr1, mp,lsr #12 @PHYS_OFFSET > PAGE_OFFSET?
mrc p15, 0, mp, c2, c0, 2 @ TTB control register
orr mp, mp,#TTB_EAE //enable LPAE
ALT_SMP(orr mp, mp, #TTB_FLAGS_SMP)
ALT_UP(orr mp, mp, #TTB_FLAGS_UP)
ALT_SMP(orr mp, mp, #TTB_FLAGS_SMP << 16)
ALT_UP(orr mp, mp, #TTB_FLAGS_UP << 16)
/*
* Only use split TTBRs if PHYS_OFFSET <= PAGE_OFFSET (cmp above),
* otherwise booting secondary CPUs would end up using TTBR1 for the
* identity mapping set up in TTBR0.
*/
orrls mp, mp,#TTBR1_SIZE @ set TTBCR.T1SZ
mcr p15, 0, mp, c2, c0, 2 @set TTBCR
mov mp, tbr1,lsr #20
mov tbr1, tbr1, lsl #12
addls tbr1, tbr1, #TTBR1_OFFSET
mcrr p15, 1, tbr1, mp, c2 @ set TTBR1
.endm
The situation for Arm64 is basically the same as Arm32 using LPAE.
If you have any questions, feel free to discuss.