Technical Insights: Boot Loader in Embedded Systems

Source: MCU Fun

ID: mcu168

An embedded Linux system can typically be divided into four layers from a software perspective:

1. Boot Loader. This includes the boot code embedded in firmware (optional) and the Boot Loader itself.

2. Linux Kernel. A customized kernel specific to the embedded board along with kernel boot parameters.

3. File System. This includes the root file system and file systems built on Flash memory devices. Typically, a ram disk is used as the root file system.

4. User Applications. Applications specific to the user. Sometimes, an embedded graphical user interface may also be included between the user applications and the kernel layer. Common embedded GUIs include MicroWindows and MiniGUI.

The Boot Loader is the first piece of software code that runs after the system is powered on. Recall that in the architecture of a PC, the Boot Loader consists of the BIOS (which is essentially a piece of firmware) and the OS Boot Loader located in the hard disk MBR (such as LILO and GRUB).

After the BIOS completes hardware detection and resource allocation, it reads the Boot Loader from the hard disk MBR into the system’s RAM and then hands over control to the OS Boot Loader. The main task of the Boot Loader is to read the kernel image from the hard disk into RAM and then jump to the entry point of the kernel to start the operating system.

In embedded systems, there is typically no firmware program like the BIOS (note that some embedded CPUs may have a small built-in boot program), so the entire system loading task is completely handled by the Boot Loader. For example, in an embedded system based on the ARM7TDMI core, the system usually starts executing from address 0x00000000 upon power-up or reset, where the Boot Loader program is typically located.

This article will discuss the Boot Loader of embedded systems from four aspects: the concept of Boot Loader, the main tasks of Boot Loader, the framework structure of Boot Loader, and the installation of Boot Loader.

The Concept of Boot Loader

In simple terms, the Boot Loader is a small program that runs before the operating system kernel. Through this small program, we can initialize hardware devices and establish a memory mapping, bringing the system’s software and hardware environment to a suitable state to prepare for the correct environment for the final call to the operating system kernel.

Typically, the Boot Loader is heavily dependent on hardware, especially in the embedded world. Therefore, it is almost impossible to create a universal Boot Loader in the embedded world. Nevertheless, we can still summarize some general concepts about Boot Loaders to guide users in the design and implementation of specific Boot Loaders.

1. Supported CPUs and Embedded Boards by Boot Loader

Each different CPU architecture has its own Boot Loader. Some Boot Loaders also support multiple CPU architectures, such as U-Boot, which supports both ARM and MIPS architectures. Besides being dependent on the CPU architecture, the Boot Loader also relies on the specific configuration of the embedded board. This means that for two different embedded boards, even if they are built on the same CPU, modifications to the Boot Loader source code are usually required to run the Boot Loader program from one board on another.

2. Installation Medium of Boot Loader

After powering on or resetting the system, all CPUs typically fetch instructions from a pre-defined address set by the CPU manufacturer. For example, CPUs based on the ARM7TDMI core usually fetch their first instruction from address 0x00000000 upon reset. Embedded systems based on CPUs typically have some type of solid-state storage device (such as ROM, EEPROM, or FLASH) mapped to this pre-defined address. Therefore, after powering on, the CPU will first execute the Boot Loader program.

Figure 1 shows a typical memory allocation structure of a solid-state storage device that contains the Boot Loader, kernel boot parameters, kernel image, and root file system image.

3. Devices or Mechanisms Used to Control the Boot Loader

The host and target machine are generally connected via a serial port, and the Boot Loader software typically performs I/O through the serial port during execution, such as outputting print information to the serial port and reading user control characters from the serial port.

4. Single Stage or Multi-Stage Boot Loader Startup Process

Typically, multi-stage Boot Loaders can provide more complex functions and better portability. Most Boot Loaders that boot from solid-state storage devices are two-stage processes, meaning the startup process can be divided into stage 1 and stage 2. The specific tasks completed in stage 1 and stage 2 will be discussed below.

5. Operation Mode of Boot Loader

Most Boot Loaders include two different operation modes: “Boot Loading” mode and “Downloading” mode. This distinction is only meaningful to developers. However, from the end user’s perspective, the role of the Boot Loader is to load the operating system, and there is no distinction between Boot Loading mode and Downloading mode.

Boot Loading mode: This mode is also known as “Autonomous” mode. In this mode, the Boot Loader loads the operating system from a solid-state storage device on the target machine into RAM without user intervention. This mode is the normal operating mode of the Boot Loader, so it must obviously operate in this mode when embedded products are released.

Downloading mode: In this mode, the Boot Loader on the target machine downloads files from the host via serial or network connections, such as downloading kernel images and root file system images. Files downloaded from the host are typically first saved to the target machine’s RAM by the Boot Loader and then written to the target machine’s FLASH solid-state storage device.

This mode of the Boot Loader is usually used during the initial installation of the kernel and root file system; furthermore, subsequent system updates will also use this working mode of the Boot Loader. Boot Loaders operating in this mode typically provide a simple command-line interface to their end users.

Powerful Boot Loaders like Blob or U-Boot typically support both operation modes and allow users to switch between them. For example, Blob starts in normal Boot Loading mode but will delay for 10 seconds waiting for the end user to press any key to switch to Downloading mode. If no key is pressed within 10 seconds, Blob continues to boot the Linux kernel.

6. Communication Devices and Protocols Used for File Transfer Between Boot Loader and Host

The most common scenario is that the Boot Loader on the target machine transfers files to the host via a serial port, and the transfer protocol is usually one of the xmodem/ymodem/zmodem protocols. However, since the speed of serial transmission is limited, using an Ethernet connection with the TFTP protocol to download files is a better choice.

Additionally, when discussing this topic, the software used on the host side must also be considered. For example, when downloading files via Ethernet and TFTP, the host must have software to provide TFTP services.

After discussing the above concepts of Boot Loaders, let’s take a closer look at the main tasks that Boot Loaders should accomplish.

Main Tasks and Typical Structure Framework of Boot Loader

Before continuing this section, let’s make an assumption: the kernel image and root file system image are loaded into RAM and running. This assumption is made because, in embedded systems, kernel images and root file system images can also run directly from solid-state storage devices like ROM or Flash. However, this approach undoubtedly sacrifices running speed.

From the operating system’s perspective, the overall goal of the Boot Loader is to correctly invoke the kernel for execution.

Additionally, since the implementation of the Boot Loader depends on the CPU architecture, most Boot Loaders are divided into two main parts: stage 1 and stage 2. Code that depends on the CPU architecture, such as device initialization code, is typically placed in stage 1 and is usually implemented in assembly language for brevity. Stage 2 is typically implemented in C language to achieve more complex functions and better code readability and portability.

The stage 1 of the Boot Loader typically includes the following steps (in execution order):

· Hardware device initialization.

· Preparing RAM space for loading stage 2 of the Boot Loader.

· Copying stage 2 of the Boot Loader into RAM.

· Setting up the stack.

· Jumping to the C entry point of stage 2.

The stage 2 of the Boot Loader typically includes the following steps (in execution order):

· Initializing hardware devices to be used in this stage.

· Detecting the system memory map.

· Reading the kernel image and root file system image from flash into RAM.

· Setting boot parameters for the kernel.

· Invoking the kernel.

Stage 1 of the Boot Loader

Basic hardware initialization

This is the operation that the Boot Loader executes first, aiming to prepare a basic hardware environment for the execution of stage 2 and the subsequent kernel execution. It typically includes the following steps (in execution order):

1. Mask all interrupts. Handling interrupts is typically the responsibility of OS device drivers, so there is no need to respond to any interrupts during the entire execution of the Boot Loader. Interrupt masking can be accomplished by writing to the CPU’s interrupt mask register or status register (such as the ARM CPSR register).

2. Set the CPU speed and clock frequency.

3. RAM initialization. This includes correctly setting the function registers of the system’s memory controller and various memory bank control registers.

4. Initialize the LED. Typically, the LED is driven by GPIO to indicate whether the system status is OK or Error. If there is no LED on the board, this can also be accomplished by initializing UART to print the Boot Loader’s logo character information to the serial port.

5. Disable the CPU’s internal instruction/data cache.

Preparing RAM space for loading stage 2

To achieve faster execution speed, stage 2 is typically loaded into RAM for execution, so a usable range of RAM space must be prepared for loading stage 2 of the Boot Loader.

Since stage 2 is typically executed in C language, when considering the size of the space, in addition to the size of the stage 2 executable image, stack space must also be considered. Additionally, the size of the space is best a multiple of the memory page size (typically 4KB). Generally, 1MB of RAM space is sufficient. The specific address range can be arranged arbitrarily; for example, Blob arranges its stage 2 executable image to execute in a 1MB space starting from the system RAM starting address 0xc0200000. However, arranging stage 2 to the top 1MB of the entire RAM space (i.e., (RamEnd-1MB) – RamEnd) is a recommended method.

For convenience in later descriptions, let the size of the arranged RAM space be denoted as: stage2_size (bytes), and the starting and ending addresses be denoted as: stage2_start and stage2_end (both addresses aligned to a 4-byte boundary). Therefore: stage2_end = stage2_start + stage2_size. Additionally, it must be ensured that the arranged address range is indeed writable RAM space, so the arranged address range must be tested. A specific testing method can be similar to that of Blob, i.e., testing each memory page starting with two words to see if they are readable and writable. For convenience in later descriptions, we denote this detection algorithm as: test_mempage, with the following specific steps:

1. First, save the contents of the first two words of the memory page.

2. Write arbitrary numbers into these two words. For example, write 0x55 into the first word and 0xaa into the second word.

3. Then, immediately read back the contents of these two words. Obviously, the contents read should be 0x55 and 0xaa respectively. If not, it indicates that the address range occupied by this memory page is not a valid RAM space.

4. Write arbitrary numbers into these two words again. For example, write 0xaa into the first word and 0x55 into the second word.

5. Then, immediately read back the contents of these two words. Obviously, the contents read should be 0xaa and 0x55 respectively. If not, it indicates that the address range occupied by this memory page is not a valid RAM space.

6. Restore the original contents of these two words. Testing is complete.

To obtain a clean range of RAM space, we can also zero out the arranged RAM space.

Copying stage 2 into RAM

When copying, two points must be determined:

(1) The starting and ending addresses of the executable image of stage 2 in the solid-state storage device;

(2) The starting address of the RAM space.

Setting the stack pointer sp

Setting the stack pointer is to prepare for executing C language code. Typically, we can set the value of sp to (stage2_end-4), i.e., at the top of the 1MB RAM space arranged in section 3.1.2 (the stack grows downwards).

Additionally, before setting the stack pointer sp, the LED can be turned off to indicate to the user that we are ready to jump to stage 2.

After executing the above steps, the physical memory layout of the system should be as shown in Figure 2.

Jumping to the C entry point of stage 2

Once everything is ready, we can jump to execute stage 2 of the Boot Loader. For example, in ARM systems, this can be achieved by modifying the PC register to the appropriate address.

Figure 2 shows the system memory layout when the stage 2 executable image of the Boot Loader has just been copied to RAM.

Stage 2 of the Boot Loader

As mentioned earlier, the code of stage 2 is typically implemented in C language to facilitate more complex functions and achieve better code readability and portability. However, unlike ordinary C language applications, when compiling and linking programs like Boot Loaders, we cannot use any support functions from the glibc library. The reason is obvious. This brings us to a problem: how to jump into the main() function? Directly using the starting address of the main() function as the entry point of the entire stage 2 executable image may seem like the most straightforward idea. However, this approach has two drawbacks: 1) it cannot pass function parameters to the main() function; 2) it cannot handle the return from the main() function. A more clever method is to use the concept of a trampoline. That is, write a small trampoline program in assembly language and use this trampoline program as the entry point of the stage 2 executable image. Then we can use CPU jump instructions in the trampoline assembly program to jump into the main() function for execution; when the main() function returns, the CPU execution path will obviously return to our trampoline program. In short, the idea of this method is to use this trampoline program as an external wrapper for the main() function.

Below is a simple example of a trampoline program (from Blob):

.text

.globl _trampoline

_trampoline:

bl main

/* if main ever returns we just call it again */

b _trampoline

It can be seen that when the main() function returns, we again use a jump instruction to re-execute the trampoline program—hence the term trampoline.

Initializing hardware devices to be used in this stage

This typically includes: (1) initializing at least one serial port for I/O output information to the terminal user; (2) initializing timers, etc.

Before initializing these devices, the LED can be turned back on to indicate that we have entered the execution of the main() function.

After the device initialization is complete, some print information can be output, such as the program name string, version number, etc.

Detecting the system memory map

The memory map refers to which address ranges in the entire 4GB physical address space are allocated for addressing the system’s RAM units. For example, in the SA-1100 CPU, the address space starting from 0xC0000000 is used as the system’s RAM address space, while in the Samsung S3C44B0X CPU, the address space from 0x0c000000 to 0x10000000 is used as the system’s RAM address space. Although CPUs typically reserve a large enough address space for system RAM, when building specific embedded systems, it may not implement all of the CPU-reserved RAM address space. This means that specific embedded systems often only map a portion of the entire CPU-reserved RAM address space to RAM units, leaving the remaining portion of the reserved RAM address space unused. Due to this fact, the stage 2 of the Boot Loader must detect the entire system’s memory mapping before it can perform any actions (such as reading the kernel image stored in flash into RAM); it must know which parts of the CPU-reserved RAM address space are actually mapped to RAM address units and which are in an “unused” state.

(1) Description of Memory Mapping

A segment of continuous address range in the RAM address space can be described using the following data structure:

typedef struct memory_area_struct {

u32 start; /* the base address of the memory region */

u32 size; /* the byte number of the memory region */

int used;

} memory_area_t;

This continuous address range in the RAM address space can be in one of two states: (1) used=1, indicating that this continuous address range has been implemented and is indeed mapped to RAM units.

(2) used=0, indicating that this continuous address range has not been implemented by the system and is in an unused state.

Based on the above memory_area_t data structure, the entire CPU-reserved RAM address space can be represented using an array of memory_area_t type, as follows:

memory_area_t memory_map[NUM_MEM_AREAS] = {

[0 … (NUM_MEM_AREAS – 1)] = {

.start = 0,

.size = 0,

.used = 0

};

(2) Memory Mapping Detection

Below is a simple and effective algorithm that can be used to detect the memory mapping situation of the entire RAM address space:

/* Array initialization */

for(i = 0; i < NUM_MEM_AREAS; i++)

memory_map[i].used = 0;

/* first write a 0 to all memory locations */

for(addr = MEM_START; addr < MEM_END; addr += PAGE_SIZE)

* (u32 *)addr = 0;

for(i = 0, addr = MEM_START; addr < MEM_END; addr += PAGE_SIZE) {

* Check if the address space starting from base address MEM_START+i*PAGE_SIZE, with size

* PAGE_SIZE, is a valid RAM address space.

Call the algorithm test_mempage() from section 3.1.2;

if ( current memory page is not a valid RAM page) {

/* no RAM here */

if(memory_map[i].used )

i++;

continue;

}

* The current page is a valid address range mapped to RAM

* but we need to check if the current page is just an alias of some address page in the 4GB address space?

if(* (u32 *)addr != 0) { /* alias? */

/* This memory page is an alias of some address page in the 4GB address space */

if ( memory_map[i].used )

i++;

continue;

}

* The current page is a valid address range mapped to RAM

* and it is not an alias of some address page in the 4GB address space.

if (memory_map[i].used == 0) {

memory_map[i].start = addr;

memory_map[i].size = PAGE_SIZE;

memory_map[i].used = 1;

} else {

memory_map[i].size += PAGE_SIZE;

}

} /* end of for (…) */

After using the above algorithm to detect the system’s memory mapping situation, the Boot Loader can also print the detailed information of the memory mapping to the serial port.

Loading the Kernel Image and Root File System Image

(1) Planning Memory Occupancy Layout

This includes two aspects: (1) the memory range occupied by the kernel image; (2) the memory range occupied by the root file system. When planning the memory occupancy layout, the base address and the size of the image are the main considerations.

For the kernel image, it is generally copied to a memory range starting from (MEM_START + 0x8000) of about 1MB in size (embedded Linux kernels generally do not exceed 1MB). Why should we leave a 32KB memory space from MEM_START to MEM_START + 0x8000? This is because the Linux kernel needs to place some global data structures in this memory, such as boot parameters and kernel page tables.

For the root file system image, it is generally copied to the location starting from MEM_START + 0x00100000. If using Ramdisk as the root file system image, its uncompressed size is generally 1MB.

(2) Copying from Flash

Since embedded CPUs like ARM typically address Flash and other solid-state storage devices in a unified memory address space, reading data from Flash is no different from reading data from RAM units. A simple loop can accomplish the task of copying images from Flash devices:

while(count) {

*dest++ = *src++; /* they are all aligned with word boundary */

count -= 4; /* byte number */

};

Setting Kernel Boot Parameters

It should be noted that after copying the kernel image and root file system image into RAM, we can prepare to boot the Linux kernel. However, before invoking the kernel, one preparatory step should be taken: setting the boot parameters for the Linux kernel.

Linux kernels from version 2.4.x onwards expect boot parameters to be passed in the form of a tagged list. The boot parameter tagged list starts with the ATAG_CORE tag and ends with the ATAG_NONE tag. Each tag consists of a tag_header structure identifying the passed parameters and the subsequent parameter value data structure. The tag and tag_header data structures are defined in the Linux kernel source file include/asm/setup.h:

/* The list ends with an ATAG_NONE node. */

#define ATAG_NONE 0x00000000

struct tag_header {

u32 size; /* Note that size is in words */

u32 tag;

};

……

struct tag {

struct tag_header hdr;

union {

struct tag_core core;

struct tag_mem32 mem;

struct tag_videotext videotext;

struct tag_ramdisk ramdisk;

struct tag_initrd initrd;

struct tag_serialnr serialnr;

struct tag_revision revision;

struct tag_videolfb videolfb;

struct tag_cmdline cmdline;

* Acorn specific

struct tag_acorn acorn;

* DC21285 specific

struct tag_memclk memclk;

} u;

};

In embedded Linux systems, common boot parameters that need to be set by the Boot Loader include: ATAG_CORE, ATAG_MEM, ATAG_CMDLINE, ATAG_RAMDISK, ATAG_INITRD, etc.

For example, the code to set ATAG_CORE is as follows:

params = (struct tag *)BOOT_PARAMS;

params->hdr.tag = ATAG_CORE;

params->hdr.size = tag_size(tag_core);

params->u.core.flags = 0;

params->u.core.pagesize = 0;

params->u.core.rootdev = 0;

params = tag_next(params);

Here, BOOT_PARAMS indicates the starting base address of the kernel boot parameters in memory, and the pointer params is a pointer of type struct tag. The macro tag_next() takes the pointer pointing to the current tag as a parameter and calculates the starting address of the next tag immediately following the current tag. Note that the device ID of the root file system is set here.

Below is an example code for setting the memory mapping situation:

for(i = 0; i < NUM_MEM_AREAS; i++) {

if(memory_map[i].used) {

params->hdr.tag = ATAG_MEM;

params->hdr.size = tag_size(tag_mem32);

params->u.mem.start = memory_map[i].start;

params->u.mem.size = memory_map[i].size;

params = tag_next(params);

}

It can be seen that each valid memory segment in the memory_map[] array corresponds to an ATAG_MEM parameter tag.

During startup, the Linux kernel can receive information in the form of command line parameters, allowing us to provide hardware parameter information that the kernel cannot detect itself or override information detected by the kernel. For example, we can use a command line parameter string “console=ttyS0,115200n8” to inform the kernel to use ttyS0 as the console, with settings of “115200bps, no parity, 8 data bits”. Below is a segment of example code for setting the kernel command line parameter string:

char *p;

/* eat leading white space */

for(p = commandline; *p == ‘ ‘; p++)

;

/* skip non-existent command lines so the kernel will still

* use its default command line.

if(*p == ‘\0’)

return;

params->hdr.tag = ATAG_CMDLINE;

params->hdr.size = (sizeof(struct tag_header) + strlen(p) + 1 + 4) >> 2;

strcpy(params->u.cmdline.cmdline, p);

params = tag_next(params);

Note that when setting the size of the tag_header in the above code, it must include the string’s terminating character ‘\0’, and the byte count must be rounded up to 4 bytes, as the size member in the tag_header structure represents the number of words.

Below is an example code for setting ATAG_INITRD, which tells the kernel where in RAM it can find the initrd image (compressed format) and its size:

params->hdr.tag = ATAG_INITRD2;

params->hdr.size = tag_size(tag_initrd);

params->u.initrd.start = RAMDISK_RAM_BASE;

params->u.initrd.size = INITRD_LEN;

params = tag_next(params);

Below is an example code for setting ATAG_RAMDISK, which tells the kernel how large the uncompressed Ramdisk is (in KB):

params->hdr.tag = ATAG_RAMDISK;

params->hdr.size = tag_size(tag_ramdisk);

params->u.ramdisk.start = 0;

params->u.ramdisk.size = RAMDISK_SIZE; /* Note that the unit is KB */

params->u.ramdisk.flags = 1; /* automatically load ramdisk */

params = tag_next(params);

Finally, set the ATAG_NONE tag to end the entire boot parameter list:

static void setup_end_tag(void)

{

params->hdr.tag = ATAG_NONE;

params->hdr.size = 0;

}

Invoking the Kernel

The Boot Loader invokes the Linux kernel by directly jumping to the first instruction of the kernel, i.e., directly jumping to the address MEM_START + 0x8000. When jumping, the following conditions must be met:

1. Setting CPU registers:

· R0 = 0;

· R1 = machine type ID; for information on Machine Type Number, refer to linux/arch/arm/tools/mach-types.

· R2 = starting base address of the boot parameter tagged list in RAM;

2. CPU mode:

· Interrupts (IRQs and FIQs) must be disabled;

· The CPU must be in SVC mode;

3. Cache and MMU settings:

· MMU must be turned off;

· Instruction cache can be either on or off;

· Data cache must be turned off;

If using C language, the kernel can be invoked as shown in the following example code:

void (*theKernel)(int zero, int arch, u32 params_addr) =

(void (*)(int, int, u32))KERNEL_RAM_BASE;……

theKernel(0, ARCH_NUMBER, (u32) kernel_params_start);

Note that theKernel() function call should never return. If this call returns, it indicates an error.

About Serial Terminal

In the design and implementation of the Boot Loader program, nothing is more exciting than correctly receiving print information from the serial terminal. Additionally, printing information to the serial terminal is also a very important and effective debugging tool. However, we often encounter issues where the serial terminal displays garbled characters or does not display anything at all. The main reasons for this problem are:

(1) Incorrect initialization settings for the serial port in the Boot Loader.

(2) Incorrect settings for the terminal emulation program running on the host side, including baud rate, parity, data bits, and stop bits.

Additionally, there are times when we can correctly output information to the serial terminal during the Boot Loader’s execution, but after the Boot Loader starts the kernel, we cannot see the kernel’s startup output information. The reasons for this issue can be considered from the following aspects:

(1) First, please confirm that your kernel has been configured to support the serial terminal during compilation and has the correct serial driver configured.

(2) Your Boot Loader’s initialization settings for the serial port may be inconsistent with the kernel’s initialization settings for the serial port. Furthermore, for CPUs like s3c44b0x, the setting of the CPU clock frequency will also affect the serial port, so if the Boot Loader and kernel have inconsistent settings for the CPU clock frequency, it will also prevent the serial terminal from displaying information correctly.

(3) Finally, ensure that the base address used by the Boot Loader for the kernel must be consistent with the run base address used during the kernel image compilation, especially for uClinux. For example, if your kernel image was compiled with a base address of 0xc0008000, but your Boot Loader loads it to 0xc0010000, then the kernel image will certainly not execute correctly.

Conclusion

The design and implementation of the Boot Loader is a very complex process. If you cannot receive the exciting “uncompressing linux……………… done, booting the kernel……” kernel startup information from the serial port, I doubt anyone can say, “Hey, my Boot Loader has successfully started!”

Related posts

Leave a Comment Cancel reply