Understanding User and Kernel Page Table Mapping in Arm32 Linux

Often, we see Arm Linux enthusiasts discussing whether the Arm32 Linux kernel uses two sets of page tables, TTBR0 and TTBR1. I hope this article can clarify this issue, and I welcome discussions.

On the Fuxing train heading to Beijing, I finally have some time to write this article (thanks to the Wi-Fi on the high-speed train), another reason to abandon the airplane.

Understanding User and Kernel Page Table Mapping in Arm32 Linux

Our viewpoint

This issue is somewhat complex and needs to be discussed in different situations.

First, let’s state the conclusion:

1. If it is a Cortex-A9/5 CPU that does not support LPAE (Large Physical Address Extension), or those CPUs that support LPAE but the kernel does not enable LPAE (CONFIG_ARM_LPAE is not enabled), then only one TTBR0 is used, and TTBR1 is not used.

2. For CPUs that support LPAE, such as Cortex-A7/A15/A17, if LPAE is enabled in the kernel (CONFIG_ARM_LPAE is enabled), both TTBR0 and TTBR1 are used.

This is why two page tables are used.

Linux’s address space is divided into user and kernel space.

The virtual address of the user address space corresponds to the physical address and changes with different applications, but the virtual address and physical address correspondence in the kernel space is constant and global.

The division between user space and kernel space is determined by the kernel configuration option CONFIG_PAGE_OFFSET. It can be configured to 3G/1G or 2G/2G.

Every time a new process is created, the kernel copies the memory map of the parent process, including the kernel mapping; the process is as follows:

do_fork–> copy_process –> copy_mm –> dup_mm –> mm_init –> mm_alloc_pgd –> pgd_alloc –>

/*

* Copy over the kernel and IO PGD entries

*/

init_pgd = pgd_offset_k(0);

memcpy(new_pgd + USER_PTRS_PER_PGD, init_pgd + USER_PTRS_PER_PGD,

(PTRS_PER_PGD – USER_PTRS_PER_PGD) * sizeof(pgd_t));

The kernel mapping is stored in swapper_pg_dir, which is a global page table that will be copied to each process’s page table during the above process. You might wonder if copying the entire kernel page table is a large workload? And once the kernel mapping changes, does every process’s kernel mapping need to be updated?

The actual situation is that the kernel mapping of the process does not need to be copied entirely; only the first-level top page table entry needs to be copied.

Returning to the topic of TTBR0/TTBR1, in the Arm architecture, v7-A introduced TTBR0/TTBR1. The basic idea is to separate the user and kernel mappings; MMU TTBR0 points to the user page table, while TTBR1 points to the kernel page table. The MMU hardware automatically selects the page table pointed to by TTBR0 or TTBR1 for virtual to physical address translation based on the input virtual address.

In CPUs without LPAE, or with LPAE hardware support but the software not enabled, the division of user and kernel space is determined by MMU TTBCR.N as shown in the table below:

TTBCR.N	TTBR1 Page Table Coverage Address Start
0b000	TTBR1 not used
0b001	0x80000000
0b010	0x40000000
0b011	0x20000000
0b100	0x10000000
0b101	0x08000000
0b110	0x04000000
0b111	0x02000000

From this, we can see that the address that can be divided can be TTBCR.N can divide out:

User/Kernel: 1G/3G, 2G/2G, or not divided.

It can be seen that it cannot divide out User/Kernel: 3G/1G. So, in fact, if the kernel is configured to a 2G/2G division, it can use TTBR0/TTBR1, but doing so will lead to the need for different handling for using 3G/1G and 2G/2G kernels, which will bring development and maintenance costs.

Therefore, the Arm Linux kernel ultimately chooses to only use TTBR0. This is also what Russell King said: Arm Linux does not use TTBR1.

http://hackers4hackers.blogspot.com/2014/02/arm-linux-do-not-use-ttbr1.html

This is considered a mistake of the Arm architecture not considering the Linux usage scenario, as before Android and Apple, the main OS for Arm mobile was Symbian and WinCE.

Therefore, when Arm added the LPAE extension, some modifications were made:

1. Allow more flexible division of address space.

2. The start address and size of the upper and lower address spaces can be set separately.

This is set by TTBCR.T0SZ and TTBCR.T1SZ.

TTBCR	Coverage Address Range
T0SZ	T1SZ	TTBR0	TTBR1
0b000	0b000	All addresses	Not used
M	0b000	0 to 2^(32-M)-1	2^(32-M) to maximum address
0b000	N	0 to (2^32-2^(32-N)-1)	(2^32-2^(32-N)) to maximum address
M	N	0 to (2^(32-M)-1)	2^32-2^(32-N) to maximum address

On CPUs with LPAE, and the software enables this function, the following address segmentation methods can be used:

So now we can divide out a 3G/1G User/Kernel address space.

Code Analysis

In head.s, after enabling the MMU, it jumps to __v7_proc_info->__cpu_flush for execution.

In proc-v7.S, some member variables are defined using the __v7_proc macro in __v7_proc_info, where __cpu_flush=__v7_setup.__v7_setup is defined in proc-v7.S as follows:

#ifdef CONFIG_ARM_LPAE

mov r5, #0 @ highTTBR0

mov r8, r4, lsr#12 @TTBR1 is swapper_pg_dir pfn

#else

mov r8, r4 @ setTTBR1 to swapper_pg_dir

#endif

ldr r12, [r10,#PROCINFO_INITFUNC]

add r12, r12,r10 //r12->__v7_setup

ret r12 //call __v7_setup

1: b __enable_mmu

In __v7_setup, TTBCR and TTBRx will be set.

proc-v7.s

mcr p15, 0, r10,c8, c7, 0 @invalidate I + D TLBs

v7_ttb_setup r10, r4, r5, r8, r3 @ TTBCR, TTBRx setup

ldr r3, =PRRR @ PRRR

ldr r6, =NMRR @ NMRR

mcr p15, 0, r3,c10, c2, 0 @ writePRRR

mcr p15, 0, r6,c10, c2, 1 @ writeNMRR

According to whether LPAE is enabled or not, __v7_setup can implement:

1. When LPAE is not enabled, it is implemented in Proc-v7-2level.s.

2. When LPAE is enabled, it is implemented in Proc-v7-3level.s.

In the case of not using LPAE

proc-v7-2level.s

/*

* Macro for setting up the TTBRx and TTBCR registers.

* – tb0 and tb1 updated with the corresponding flags.

*/

.macro v7_ttb_setup,zero, ttbr0l, ttbr0h, ttbr1, tmp

mcr p15, 0, tb0, c2, c0, 2 @ TTB control register, set TTBCR=0, so only use TTBR0

ALT_SMP(orr tbr0l, tbr0l, #TTB_FLAGS_SMP)

ALT_UP(orr tbr0l, tbr0l, #TTB_FLAGS_UP)

ALT_SMP(orr tbr1, tbr1, #TTB_FLAGS_SMP)

ALT_UP(orr tbr1, tbr1, #TTB_FLAGS_UP)

mcr p15, 0, tbr1, c2, c0, 1 @load TTB1

.endm

In the case of using LPAE

/*

* TTBR0/TTBR1 split (PAGE_OFFSET):

* 0x40000000:T0SZ = 2, T1SZ = 0 (not used)

* 0x80000000: T0SZ = 0, T1SZ = 1

* 0xc0000000: T0SZ = 0, T1SZ = 2

*

* Only use this feature if PHYS_OFFSET <= PAGE_OFFSET, otherwise

* booting secondary CPUs would end up using TTBR1 for the identity

* mapping set up in TTBR0.

*/

#if definedCONFIG_VMSPLIT_2G

#defineTTBR1_OFFSET 16 /* skip two L1 entries */

#elif defined CONFIG_VMSPLIT_3G

#defineTTBR1_OFFSET (4096 * (1 + 3)) /* only L2, skip pgd + 3*pmd */

#else

#defineTTBR1_OFFSET 0

#endif

#defineTTBR1_SIZE (((PAGE_OFFSET>> 30) – 1) << 16)

proc-v7-3level.s

.macro v7_ttb_setup,zero, ttbr0l, ttbr0h, ttbr1, tmp

ldr mp,=swapper_pg_dir @swapper_pg_dir virtual address

cmp tbr1, mp,lsr #12 @PHYS_OFFSET > PAGE_OFFSET?

mrc p15, 0, mp, c2, c0, 2 @ TTB control register

orr mp, mp,#TTB_EAE //enable LPAE

ALT_SMP(orr mp, mp, #TTB_FLAGS_SMP)

ALT_UP(orr mp, mp, #TTB_FLAGS_UP)

ALT_SMP(orr mp, mp, #TTB_FLAGS_SMP << 16)

ALT_UP(orr mp, mp, #TTB_FLAGS_UP << 16)

/*

* Only use split TTBRs if PHYS_OFFSET <= PAGE_OFFSET (cmp above),

* otherwise booting secondary CPUs would end up using TTBR1 for the

* identity mapping set up in TTBR0.

*/

orrls mp, mp,#TTBR1_SIZE @ set TTBCR.T1SZ

mcr p15, 0, mp, c2, c0, 2 @set TTBCR

mov mp, tbr1,lsr #20

mov tbr1, tbr1, lsl #12

addls tbr1, tbr1, #TTBR1_OFFSET

mcrr p15, 1, tbr1, mp, c2 @ set TTBR1

.endm

The situation for Arm64 is basically the same as Arm32 using LPAE.

If you have any questions, feel free to discuss.

Related posts

Leave a Comment Cancel reply