Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World
-
1. Introduction
-
2. Project Overview
-
2.1 Makefile
-
2.2 link.ld Linker File
-
3. Understanding Code Execution from the CPU Perspective
-
3.1 start.S File
-
3.2 Functionality of the main Function
-
4. Raspberry Pi 4 Serial Peripheral Program
-
4.1 Setting GPIO Functionality
-
4.2 Configuring the Serial Controller
-
5. Conclusion
1. Introduction
When we study a system, we need to start with the simplest program. Previous articles have described the environment setup and boot process of the project.
Raspberry Pi 4 Bare-Metal Basics: Environment Setup
Raspberry Pi 4 Bare-Metal Basics: Chip Boot to Code Execution
This article mainly analyzes the simplest bare-metal code to allow the board’s serial port to output hello world information. This article will introduce the construction of the project, the operation of the program, and a series of processes, as well as how the Raspberry Pi 4 finally outputs hello world. In embedded development, the beginning is often the hardest part; only when we see the program running can the subsequent work become easier.
2. Project Overview
We will take the first project 1.compilation_environment as the research object. The project’s address is in the link below:
https://github.com/bigmagic123/raspi4-bare-metal.git
The final project files are as follows:
2.1 Makefile
We use the Makefile to build the relevant project, using make to generate the kernel executable program file. For such a simple project, using a Makefile for construction is quite simple; for complex projects, more advanced tools like scons or cmake can be used for construction.
First, let’s look at the contents of the Makefile:
SRCS = $(wildcard *.c)
OBJS = $(SRCS:.c=.o)
CFLAGS = -march=armv8-a -mtune=cortex-a72 -Wall -O2 -ffreestanding -nostdinc -nostdlib -nostartfiles
all: clean kernel7.img
start.o: start.S
arm-none-eabi-gcc $(CFLAGS) -c start.S -o start.o
%.o: %.c
arm-none-eabi-gcc $(CFLAGS) -c $< -o $@
kernel7.img: start.o $(OBJS)
arm-none-eabi-ld -nostdlib -nostartfiles start.o $(OBJS) -T link.ld -o kernel7.elf
arm-none-eabi-objcopy -O binary kernel7.elf kernel7.img
clean:
rm kernel7.elf kernel7.img *.o >/dev/null 2>/dev/null || true
Let’s analyze the details of this file:
SRCS = $(wildcard *.c)
This uses the wildcard
function to get a list of all .c files in the current folder and store it in the SRCS
variable.
OBJS = $(SRCS:.c=.o)
This line indicates variable substitution, replacing all .c file names in the SRCS
list with .o file names.
all: clean kernel7.img
When using make
or make all
, it will execute the commands corresponding to clean
and kernel7.img
.
start.o: start.S
arm-none-eabi-gcc $(CFLAGS) -c start.S -o start.o
According to the syntax rules of makefile, this explanation should be:
target: source
command
Since the previous definitions only defined C language code, we also need to add assembly language compilation here.
%.o: %.c
arm-none-eabi-gcc $(CFLAGS) -c $< -o $@
Here, $<
represents the first dependent file’s name, and $@
represents the target file’s name.
kernel7.img: start.o $(OBJS)
arm-none-eabi-ld -nostdlib -nostartfiles start.o $(OBJS) -T link.ld -o kernel7.elf
arm-none-eabi-objcopy -O binary kernel7.elf kernel7.img
This links all .o files using arm-none-eabi-ld
. arm-none-eabi-objcopy
is used to generate an executable program that runs on the ARM platform, and another function is to strip some symbol information.
clean:
rm kernel7.elf kernel7.img *.o >/dev/null 2>/dev/null || true
This is used to clean up intermediate files from the compilation process.
2.2 link.ld Linker File
After compiling the program, linking is required, and the link file tells the program the linking rules. Let’s look at the contents of the link file:
SECTIONS {
/*
* First and formost we need the .init section, containing the code to
* be run first. We allow room for the ATAGs and stack and conform to
* the bootloader's expectation by putting this code at 0x8000.
*/
. = 0x8000;
.text : {
KEEP(*(.text.boot))
*(.text .text.* .gnu.linkonce.t*)
}
/*
* Next we put the data.
*/
.data : {
*(.data)
}
.bss : {
. = ALIGN(16);
__bss_start = .;
*(.bss*)
*(COMMON*)
__bss_end = .;
}
}
__bss_size = (__bss_end - __bss_start) >> 3;
The program is divided into code segments (.text), data segments (.data), and bss segments (.bss). First, the address of the code segment . = 0x8000;
points to the address 0x8000, because by default, the Raspberry Pi starts loading the program from this address.
KEEP(*(.text.boot))
indicates that the contents of .text.boot
should be placed at the first address, starting at 0x8000
. It is important to note that the bss segment contains data initialized to zero, and by placing this data in a separate section, the compiler can omit some space in the elf file. Therefore, it is necessary to record the bss_start and bss_end segments. This segment must also be aligned; if not aligned, some functions may access incorrect data.
3. Understanding Code Execution from the CPU Perspective
To truly understand the flow of code execution by the CPU, one must execute the code logic as if one were the CPU.
3.1 start.S File
The start.S
file sets some CPU states and prepares the environment for subsequent program execution.
.equ Mode_USR, 0x10
.equ Mode_FIQ, 0x11
.equ Mode_IRQ, 0x12
.equ Mode_SVC, 0x13
.equ Mode_ABT, 0x17
.equ Mode_UND, 0x1B
.equ Mode_SYS, 0x1F
.section ".text.boot"
/* entry */
.globl _start
_start:
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continue
overHyped: /* Get out of HYP mode */
adr r1, continue
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;@ CPSR_MODE_MASK
orr r1, r1, #0x13 ;@ CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
eret
continue:
/* Suspend the other cpu cores */
mrc p15, 0, r0, c0, c0, 5
ands r0, #3
bne _halt
/* set the cpu to SVC32 mode and disable interrupt */
cps #Mode_SVC
/* disable the data alignment check */
mrc p15, 0, r1, c1, c0, 0
bic r1, #(1<<1)
mcr p15, 0, r1, c1, c0, 0
/* set stack before our code */
ldr sp, =_start
/* clear .bss */
mov r0,#0 /* get a zero */
ldr r1,=__bss_start /* bss start */
ldr r2,=__bss_end /* bss end */
bss_loop:
cmp r1,r2 /* check if data to clear */
strlo r0,[r1],#4 /* clear 4 bytes */
blo bss_loop /* loop until done */
/* jump to C code, should not return */
ldr pc, _main
b _halt
_main:
.word main
_halt:
wfe
b _halt
Let’s look at these codes in detail.
.section ".text.boot"
This indicates that this segment is marked as .text.boot
, meaning that this file will be linked to the starting address in the link script. Then, _start
is designated to the address 0x8000
.
/* entry */
.globl _start
_start:
/* Check for HYP mode */
mrs r0, cpsr_all
and r0, r0, #0x1F
mov r8, #0x1A
cmp r0, r8
beq overHyped
b continue
overHyped: /* Get out of HYP mode */
adr r1, continue
msr ELR_hyp, r1
mrs r1, cpsr_all
and r1, r1, #0x1f ;@ CPSR_MODE_MASK
orr r1, r1, #0x13 ;@ CPSR_MODE_SUPERVISOR
msr SPSR_hyp, r1
eret
When the Raspberry Pi starts executing the first line of code, it is in virtual mode. The current status can be read from the cpsr_all
register. At this point, it is necessary to exit virtual mode and run in Supervisor
mode, using the eret
instruction to switch modes.
/* Suspend the other cpu cores */
mrc p15, 0, r0, c0, c0, 5
ands r0, #3
bne _halt
Since the Raspberry Pi 4 initially supports 4 cores, and currently, there is no need for multiple cores, the other cores can be put into low-power standby mode WFE (Wait for Event).
/* set the cpu to SVC32 mode and disable interrupt */
cps #Mode_SVC
/* disable the data alignment check */
mrc p15, 0, r1, c1, c0, 0
bic r1, #(1<<1)
mcr p15, 0, r1, c1, c0, 0
Next, interrupts are disabled and alignment checks are turned off to prepare the environment for subsequent code execution.
/* set stack before our code */
ldr sp, =_start
Then set the stack pointer sp
, ldr sp, =_start
sets the stack pointer to the address of _start
. Since during layout, the address of _start
is set to 0x8000
, and because the stack pointer on ARM grows towards lower addresses, we can consider that the space before 0x8000
is unused and can be used as the stack space for executing C language.
/* clear .bss */
mov r0,#0 /* get a zero */
ldr r1,=__bss_start /* bss start */
ldr r2,=__bss_end /* bss end */
bss_loop:
cmp r1,r2 /* check if data to clear */
strlo r0,[r1],#4 /* clear 4 bytes */
blo bss_loop /* loop until done */
/* jump to C code, should not return */
ldr pc, _main
b _halt
Then clear the BSS segment. The BSS segment is typically used to store uninitialized or zero-initialized global and static variables. It is readable and writable, and before the program executes, the BSS segment is automatically cleared to zero.
/* jump to C code, should not return */
ldr pc, _main
Then set the PC pointer. Using ldr pc, _main
instruction, the pointer of _main
function is directed to the PC. This way, the next time the PC program is executed, it will directly execute the main function.
3.2 Functionality of the main Function
In the previous assembly code, the environment for executing C language code was prepared, including disabling alignment checks, setting the stack pointer address, and clearing the BSS segment. These preparations are essential for executing C code. The actual business logic is implemented in C language. Since the current bare-metal code is relatively simple, the business logic is also straightforward.
#include "uart.h"
void main()
{
// set up serial console
uart_init();
// say hello
uart_puts("Hello World!\n");
// echo everything back
while(1) {
uart_send(uart_getc());
}
}
This code outputs hello world! through the serial port and continuously reads input from the serial port in the while loop. The focus is still on initializing the Raspberry Pi serial port.
4. Raspberry Pi 4 Serial Peripheral Program
When doing embedded development, we always hope that the device can interact with us, such as lighting up an LED or outputting a character via the serial port. This indicates that the program is running correctly. Therefore, writing simple interactive programs is also very important. A common simple example is an LED breathing light. Here, using the serial port allows for richer human-machine interaction. Let’s analyze the implementation of the serial port program.
Before writing the peripheral driver program, we first need to look at the chip’s Peripherals manual. Here, refer to rpi_DATA_2711_1p0.pdf
. According to the address distribution of the peripheral space, we can see the following:
Since we are using a 32-bit address space, according to the data manual, the starting address of the chip’s peripherals is 0xFE000000
.
If we want to use the serial port, we must fulfill two prerequisites:
1. Configure the relevant GPIOs to the serial port multiplexing function.
2. Configure the parameters of the serial controller.
4.1 Setting GPIO Functionality
For the Raspberry Pi’s GPIO, after finding the corresponding address, we also need to find its corresponding function.
First, check the corresponding hardware pins on the Raspberry Pi:
The corresponding functionality is as follows: Currently, the hardware pins used for the serial port are pins 14 and 15.
The multiplexing function to be set is ALT5.
With this information, we can configure the GPFSEL1
functionality.
/**
* gpio14 RX gpio15 TX
*/
void uart_gpio_init()
{
register unsigned int r;
/* map UART1 to GPIO pins */
r=*GPFSEL1;
r&=~((7<<12)|(7<<15)); // gpio14, gpio15
r|=(2<<12)|(2<<15); // alt5
*GPFSEL1 = r;
*GPPUD = 0; // enable pins 14 and 15
r=150; while(r--) { asm volatile("nop"); }
*GPPUDCLK0 = (1<<14)|(1<<15);
r=150; while(r--) { asm volatile("nop"); }
*GPPUDCLK0 = 0; // flush GPIO setup
*AUX_MU_CNTL = 3; // enable Tx, Rx
}
In the Raspberry Pi, we first need to choose which pins to enable and then configure them to the desired mode. By referring to the manual, we can understand the specific meanings of setting these register bits.
4.2 Configuring the Serial Controller
The serial controller needs to be configured. Currently, we are using the AUX serial controller, which is the mini UART. Therefore, some parameter information of the serial port needs to be configured, such as baud rate, bit width, stop bits, etc.
*/
void uart_init()
{
/* initialize UART1 */
*AUX_ENABLE |=1; // enable UART1, AUX mini uart
*AUX_MU_CNTL = 0;
*AUX_MU_LCR = 3; // 8 bits
*AUX_MU_MCR = 0;
*AUX_MU_IER = 0;
*AUX_MU_IIR = 0xc6; // disable interrupts
*AUX_MU_BAUD = 270; // 115200 baud
uart_gpio_init();
}
Currently, the serial port does not need to use interrupts, so data is directly sent and received from the serial port’s FIFO.
Sending Data
/**
* Send a character
*/
void uart_send(unsigned int c) {
/* wait until we can send */
do{asm volatile("nop");}while(!(*AUX_MU_LSR&0x20));
/* write the character to the buffer */
*AUX_MU_IO=c;
}
This checks if there is data in the FIFO; if not, it sends to the serial port’s FIFO.
char uart_getc() {
char r;
/* wait until something is in the buffer */
do{asm volatile("nop");}while(!(*AUX_MU_LSR&0x01));
/* read it and return */
r=(char)(*AUX_MU_IO);
/* convert carriage return to newline */
return r=='\r'? '\n': r;
}
This reads characters from the serial port’s FIFO.
5. Conclusion
From the analysis of the Raspberry Pi 4 hello world program, it describes in detail the process of outputting information to the console via the serial port. The preliminary preparation phase for the C language runtime environment is something that many similar series of chips need to do, while the initialization of peripherals may be related to specific hardware platforms. However, overall, the entire process is quite common. On different chips and architectures, these basic operations need to be performed.
This article describes the system startup process from the perspective of the minimal system. Configuring register parameters requires consulting the manual; thus, extensive reading of the manual is essential for learning to use a chip properly. Only through repeated reading and thoughtful understanding can one use it appropriately. As Ouyang Xiu said in “The Oil Seller”: There is no other way, just that the hands are familiar.