Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

  • 1. Introduction

  • 2. Project Overview

    • 2.1 Makefile

    • 2.2 link.ld Linker File

  • 3. Understanding Code Execution from the CPU Perspective

    • 3.1 start.S File

    • 3.2 Functionality of the main Function

  • 4. Raspberry Pi 4 Serial Peripheral Program

    • 4.1 Setting GPIO Functionality

    • 4.2 Configuring the Serial Controller

  • 5. Conclusion

1. Introduction

When we study a system, we need to start with the simplest program. Previous articles have described the environment setup and boot process of the project.

Raspberry Pi 4 Bare-Metal Basics: Environment Setup

Raspberry Pi 4 Bare-Metal Basics: Chip Boot to Code Execution

This article mainly analyzes the simplest bare-metal code to allow the board’s serial port to output hello world information. This article will introduce the construction of the project, the operation of the program, and a series of processes, as well as how the Raspberry Pi 4 finally outputs hello world. In embedded development, the beginning is often the hardest part; only when we see the program running can the subsequent work become easier.

2. Project Overview

We will take the first project 1.compilation_environment as the research object. The project’s address is in the link below:

https://github.com/bigmagic123/raspi4-bare-metal.git

The final project files are as follows:

Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

2.1 Makefile

We use the Makefile to build the relevant project, using make to generate the kernel executable program file. For such a simple project, using a Makefile for construction is quite simple; for complex projects, more advanced tools like scons or cmake can be used for construction.

First, let’s look at the contents of the Makefile:

SRCS = $(wildcard *.c)
OBJS = $(SRCS:.c=.o)
CFLAGS = -march=armv8-a -mtune=cortex-a72 -Wall -O2 -ffreestanding -nostdinc -nostdlib -nostartfiles

all: clean kernel7.img

start.o: start.S
 arm-none-eabi-gcc $(CFLAGS) -c start.S -o start.o

%.o: %.c
 arm-none-eabi-gcc $(CFLAGS) -c $< -o $@

kernel7.img: start.o $(OBJS)
 arm-none-eabi-ld -nostdlib -nostartfiles start.o $(OBJS) -T link.ld -o kernel7.elf
 arm-none-eabi-objcopy -O binary kernel7.elf kernel7.img

clean:
 rm kernel7.elf kernel7.img *.o >/dev/null 2>/dev/null || true

Let’s analyze the details of this file:

SRCS = $(wildcard *.c)

This uses the wildcard function to get a list of all .c files in the current folder and store it in the SRCS variable.

OBJS = $(SRCS:.c=.o)

This line indicates variable substitution, replacing all .c file names in the SRCS list with .o file names.

all: clean kernel7.img

When using make or make all, it will execute the commands corresponding to clean and kernel7.img.

start.o: start.S
 arm-none-eabi-gcc $(CFLAGS) -c start.S -o start.o

According to the syntax rules of makefile, this explanation should be:

target: source
 command

Since the previous definitions only defined C language code, we also need to add assembly language compilation here.

%.o: %.c
 arm-none-eabi-gcc $(CFLAGS) -c $< -o $@

Here, $< represents the first dependent file’s name, and $@ represents the target file’s name.

kernel7.img: start.o $(OBJS)
 arm-none-eabi-ld -nostdlib -nostartfiles start.o $(OBJS) -T link.ld -o kernel7.elf
 arm-none-eabi-objcopy -O binary kernel7.elf kernel7.img

This links all .o files using arm-none-eabi-ld. arm-none-eabi-objcopy is used to generate an executable program that runs on the ARM platform, and another function is to strip some symbol information.

clean:
 rm kernel7.elf kernel7.img *.o >/dev/null 2>/dev/null || true

This is used to clean up intermediate files from the compilation process.

2.2 link.ld Linker File

After compiling the program, linking is required, and the link file tells the program the linking rules. Let’s look at the contents of the link file:

SECTIONS {
	/*
	* First and formost we need the .init section, containing the code to
        * be run first. We allow room for the ATAGs and stack and conform to
        * the bootloader's expectation by putting this code at 0x8000.
	*/
    . = 0x8000;
    .text : {
        KEEP(*(.text.boot))
        *(.text .text.* .gnu.linkonce.t*)
    }

	/*
	* Next we put the data.
	*/
	.data : {
		*(.data)
	}

  .bss : {
        . = ALIGN(16);
        __bss_start = .;
        *(.bss*)
        *(COMMON*)
        __bss_end = .;
    }
}
__bss_size = (__bss_end - __bss_start) >> 3;

The program is divided into code segments (.text), data segments (.data), and bss segments (.bss). First, the address of the code segment . = 0x8000; points to the address 0x8000, because by default, the Raspberry Pi starts loading the program from this address.

KEEP(*(.text.boot)) indicates that the contents of .text.boot should be placed at the first address, starting at 0x8000. It is important to note that the bss segment contains data initialized to zero, and by placing this data in a separate section, the compiler can omit some space in the elf file. Therefore, it is necessary to record the bss_start and bss_end segments. This segment must also be aligned; if not aligned, some functions may access incorrect data.

3. Understanding Code Execution from the CPU Perspective

To truly understand the flow of code execution by the CPU, one must execute the code logic as if one were the CPU.

3.1 start.S File

The start.S file sets some CPU states and prepares the environment for subsequent program execution.

.equ Mode_USR,        0x10
.equ Mode_FIQ,        0x11
.equ Mode_IRQ,        0x12
.equ Mode_SVC,        0x13
.equ Mode_ABT,        0x17
.equ Mode_UND,        0x1B
.equ Mode_SYS,        0x1F

.section ".text.boot"
/* entry */
.globl _start
_start:
/* Check for HYP mode */
    mrs r0, cpsr_all
    and r0, r0, #0x1F
    mov r8, #0x1A
    cmp r0, r8
    beq overHyped
    b continue

overHyped: /* Get out of HYP mode */
    adr r1, continue
    msr ELR_hyp, r1
    mrs r1, cpsr_all
    and r1, r1, #0x1f    ;@ CPSR_MODE_MASK
    orr r1, r1, #0x13    ;@ CPSR_MODE_SUPERVISOR
    msr SPSR_hyp, r1
    eret

continue:
    /* Suspend the other cpu cores */
    mrc p15, 0, r0, c0, c0, 5
    ands r0, #3
    bne _halt

    /* set the cpu to SVC32 mode and disable interrupt */
    cps #Mode_SVC

    /* disable the data alignment check */
    mrc p15, 0, r1, c1, c0, 0
    bic r1, #(1<<1)
    mcr p15, 0, r1, c1, c0, 0

    /* set stack before our code */
    ldr sp, =_start

    /* clear .bss */
    mov     r0,#0                   /* get a zero                       */
    ldr     r1,=__bss_start         /* bss start                        */
    ldr     r2,=__bss_end           /* bss end                          */

bss_loop:
    cmp     r1,r2                   /* check if data to clear           */
    strlo   r0,[r1],#4              /* clear 4 bytes                    */
    blo     bss_loop                /* loop until done                  */

    /* jump to C code, should not return */
    ldr     pc, _main
    b _halt

_main:
    .word main

_halt:
    wfe
    b _halt

Let’s look at these codes in detail.

.section ".text.boot"

This indicates that this segment is marked as .text.boot, meaning that this file will be linked to the starting address in the link script. Then, _start is designated to the address 0x8000.

/* entry */
.globl _start
_start:
/* Check for HYP mode */
    mrs r0, cpsr_all
    and r0, r0, #0x1F
    mov r8, #0x1A
    cmp r0, r8
    beq overHyped
    b continue

overHyped: /* Get out of HYP mode */
    adr r1, continue
    msr ELR_hyp, r1
    mrs r1, cpsr_all
    and r1, r1, #0x1f    ;@ CPSR_MODE_MASK
    orr r1, r1, #0x13    ;@ CPSR_MODE_SUPERVISOR
    msr SPSR_hyp, r1
    eret

When the Raspberry Pi starts executing the first line of code, it is in virtual mode. The current status can be read from the cpsr_all register. At this point, it is necessary to exit virtual mode and run in Supervisor mode, using the eret instruction to switch modes.

/* Suspend the other cpu cores */
mrc p15, 0, r0, c0, c0, 5
ands r0, #3
bne _halt

Since the Raspberry Pi 4 initially supports 4 cores, and currently, there is no need for multiple cores, the other cores can be put into low-power standby mode WFE (Wait for Event).

/* set the cpu to SVC32 mode and disable interrupt */
cps #Mode_SVC

/* disable the data alignment check */
mrc p15, 0, r1, c1, c0, 0
bic r1, #(1<<1)
mcr p15, 0, r1, c1, c0, 0

Next, interrupts are disabled and alignment checks are turned off to prepare the environment for subsequent code execution.

/* set stack before our code */
ldr sp, =_start

Then set the stack pointer sp, ldr sp, =_start sets the stack pointer to the address of _start. Since during layout, the address of _start is set to 0x8000, and because the stack pointer on ARM grows towards lower addresses, we can consider that the space before 0x8000 is unused and can be used as the stack space for executing C language.

    /* clear .bss */
    mov     r0,#0                   /* get a zero                       */
    ldr     r1,=__bss_start         /* bss start                        */
    ldr     r2,=__bss_end           /* bss end                          */

bss_loop:
    cmp     r1,r2                   /* check if data to clear           */
    strlo   r0,[r1],#4              /* clear 4 bytes                    */
    blo     bss_loop                /* loop until done                  */

    /* jump to C code, should not return */
    ldr     pc, _main
    b _halt

Then clear the BSS segment. The BSS segment is typically used to store uninitialized or zero-initialized global and static variables. It is readable and writable, and before the program executes, the BSS segment is automatically cleared to zero.

/* jump to C code, should not return */
ldr     pc, _main

Then set the PC pointer. Using ldr pc, _main instruction, the pointer of _main function is directed to the PC. This way, the next time the PC program is executed, it will directly execute the main function.

3.2 Functionality of the main Function

In the previous assembly code, the environment for executing C language code was prepared, including disabling alignment checks, setting the stack pointer address, and clearing the BSS segment. These preparations are essential for executing C code. The actual business logic is implemented in C language. Since the current bare-metal code is relatively simple, the business logic is also straightforward.

#include "uart.h"

void main()
{
    // set up serial console
    uart_init();
    
    // say hello
    uart_puts("Hello World!\n");
    
    // echo everything back
    while(1) {
        uart_send(uart_getc());
    }
}

This code outputs hello world! through the serial port and continuously reads input from the serial port in the while loop. The focus is still on initializing the Raspberry Pi serial port.

4. Raspberry Pi 4 Serial Peripheral Program

When doing embedded development, we always hope that the device can interact with us, such as lighting up an LED or outputting a character via the serial port. This indicates that the program is running correctly. Therefore, writing simple interactive programs is also very important. A common simple example is an LED breathing light. Here, using the serial port allows for richer human-machine interaction. Let’s analyze the implementation of the serial port program.

Before writing the peripheral driver program, we first need to look at the chip’s Peripherals manual. Here, refer to rpi_DATA_2711_1p0.pdf. According to the address distribution of the peripheral space, we can see the following:

Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

Since we are using a 32-bit address space, according to the data manual, the starting address of the chip’s peripherals is 0xFE000000.

If we want to use the serial port, we must fulfill two prerequisites:

1. Configure the relevant GPIOs to the serial port multiplexing function.

2. Configure the parameters of the serial controller.

4.1 Setting GPIO Functionality

For the Raspberry Pi’s GPIO, after finding the corresponding address, we also need to find its corresponding function.

First, check the corresponding hardware pins on the Raspberry Pi:

Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

The corresponding functionality is as follows: Currently, the hardware pins used for the serial port are pins 14 and 15.

Raspberry Pi 4 Bare-Metal Basics: Starting from Hello World

The multiplexing function to be set is ALT5.

With this information, we can configure the GPFSEL1 functionality.

/**
 * gpio14 RX gpio15 TX
 */
void uart_gpio_init()
{
    register unsigned int r;
    /* map UART1 to GPIO pins */
    r=*GPFSEL1;
    r&=~((7<<12)|(7<<15)); // gpio14, gpio15
    r|=(2<<12)|(2<<15);    // alt5
    *GPFSEL1 = r;
    *GPPUD = 0;            // enable pins 14 and 15
    r=150; while(r--) { asm volatile("nop"); }
    *GPPUDCLK0 = (1<<14)|(1<<15);
    r=150; while(r--) { asm volatile("nop"); }
    *GPPUDCLK0 = 0;        // flush GPIO setup
    *AUX_MU_CNTL = 3;      // enable Tx, Rx
}

In the Raspberry Pi, we first need to choose which pins to enable and then configure them to the desired mode. By referring to the manual, we can understand the specific meanings of setting these register bits.

4.2 Configuring the Serial Controller

The serial controller needs to be configured. Currently, we are using the AUX serial controller, which is the mini UART. Therefore, some parameter information of the serial port needs to be configured, such as baud rate, bit width, stop bits, etc.

 */
void uart_init()
{
    /* initialize UART1 */
    *AUX_ENABLE |=1;       // enable UART1, AUX mini uart
    *AUX_MU_CNTL = 0;
    *AUX_MU_LCR = 3;       // 8 bits
    *AUX_MU_MCR = 0;
    *AUX_MU_IER = 0;
    *AUX_MU_IIR = 0xc6;    // disable interrupts
    *AUX_MU_BAUD = 270;    // 115200 baud
    uart_gpio_init();
}

Currently, the serial port does not need to use interrupts, so data is directly sent and received from the serial port’s FIFO.

Sending Data

/**
 * Send a character
 */
void uart_send(unsigned int c) {
    /* wait until we can send */
    do{asm volatile("nop");}while(!(*AUX_MU_LSR&0x20));
    /* write the character to the buffer */
    *AUX_MU_IO=c;
}

This checks if there is data in the FIFO; if not, it sends to the serial port’s FIFO.

char uart_getc() {
    char r;
    /* wait until something is in the buffer */
    do{asm volatile("nop");}while(!(*AUX_MU_LSR&0x01));
    /* read it and return */
    r=(char)(*AUX_MU_IO);
    /* convert carriage return to newline */
    return r=='\r'? '\n': r;
}

This reads characters from the serial port’s FIFO.

5. Conclusion

From the analysis of the Raspberry Pi 4 hello world program, it describes in detail the process of outputting information to the console via the serial port. The preliminary preparation phase for the C language runtime environment is something that many similar series of chips need to do, while the initialization of peripherals may be related to specific hardware platforms. However, overall, the entire process is quite common. On different chips and architectures, these basic operations need to be performed.

This article describes the system startup process from the perspective of the minimal system. Configuring register parameters requires consulting the manual; thus, extensive reading of the manual is essential for learning to use a chip properly. Only through repeated reading and thoughtful understanding can one use it appropriately. As Ouyang Xiu said in “The Oil Seller”: There is no other way, just that the hands are familiar.

Leave a Comment

×