Complete Guide to Microcontroller Bootloader

For a complex microcontroller project, having a BootLoader (hereinafter referred to as BL) is very important. It makes maintaining and upgrading your application code much easier.

This article will help you understand why to design a Bootloader and how to design it, aiming to achieve a clear understanding of both its function and purpose.

Through a detailed explanation of BL, I hope everyone can appreciate its importance.

1. Evolution of Programming Methods

Old Programming Methods

The microcontroller was born in the 1980s, with the 51 series widely used in industrial control, home appliances, and many other industries. Initially, programming a microcontroller, which involves writing executable programs into its internal ROM, was not an easy task and was not cheap, as it relied on specialized programming equipment. Due to limitations in semiconductor technology and processes, writing to ROM usually required high voltage. This situation persisted until around the year 2000, as shown in the figure.

ISP and ICP Programming Methods

With the maturity of low-voltage electrically erasable ROM, microcontrollers began to integrate storage media that could be directly read and written through digital levels.

The greatest advantage is that it allows programming directly in the system or circuit without needing to remove the microcontroller chip from the circuit and place it on a programmer. This programming method is known as ISP (In-System Programming) or ICP (In-Circuit Programming), as shown in the figure.

Someone once asked: “I have heard of both ISP and ICP, which allow programming directly on the circuit board without removing the chip. What is the difference between ISP and ICP?”

In a broad sense, there is no difference between the two; we often confuse their meanings without issue. If we must delve deeper, we can understand it this way:

ISP requires a special program to reside in the microcontroller to communicate with the host computer, receive firmware data, and write it to its ROM. Clearly, the microcontroller using ISP must be operational, meaning it must have a basic minimum system circuit (clock and reset).

ICP can be understood as the MCU being a writable storage circuit for external access, which does not require any pre-installed programs, nor does the microcontroller need to be in an operational state.

The chip that supports ISP or ICP is the classic AT89S51. Many people stopped relying on programmers and rejoiced when switching from AT89C51 to S51. This parallel download cable became very popular, as shown in figure 3.3, and there are various small ISP software available online, which has significantly lowered the entry barrier for many into microcontrollers. A computer, a minimal S51 system board, and a parallel ISP download cable are all you need!

More Convenient ISP Programming Methods

Serial ISP

Later, we found that computers with parallel ports were becoming increasingly rare. Around 2005, STC microcontrollers began to appear in large numbers, which functionally were not much different from S51, and were even inferior to some high-end 51 microcontrollers of the same period. However, its advantage made people fond of it, further lowering the learning barrier for microcontrollers.

This advantage is the serial ISP, which is true ISP in the real sense, as shown in the figure.

Later, 9-pin serial ports became rare, and only USB remained. This led to the emergence of a programming and debugging tool that is in high demand — USB TTL serial. Now the 232 conversion chip is eliminated, allowing programming directly through USB. This method has benefited countless microcontroller learners and engineers.

Over the years, I have put a lot of thought into the interaction between serial ports and microcontrollers, which is also one reason I enjoy developing Bootloaders. I hope “with USB serial in hand, everything is possible!”

STC was not the first to use serial ISP programming, but it was the most successful and widely recognized. Many microcontrollers that appeared at the same time, including the STM32 series, which is still widely used today, also support serial ISP, making it a standard and prevalent programming method.

Various USB ISP

While serial ISP is convenient, its download speed is a significant drawback. When the firmware size is relatively large, such as some large embedded project firmware, which can easily reach hundreds of K or even several M, using serial ISP becomes too slow. Therefore, some microcontrollers come with dedicated USB ISP downloaders. Below are some mainstream microcontrollers and their USB ISP downloaders.

1) AVR

AVR microcontrollers were once very popular, but after the chip shortage in 2016 and the impact of STM32, they began to decline, with few people using them. There are many types of USB ISP downloaders compatible with them, some officially released, but more are results of enthusiasts’ open-source projects, as shown in the figure.

2) C8051F

3) MSP430

We can see that a mainstream microcontroller with a well-developed ecosystem must have efficient and convenient programming tools. This shows how important a good programming method is for microcontroller development.

Whether using serial ISP or various dedicated ISP downloaders, there are some common drawbacks.

1. They rely on specialized host computers or downloader hardware, which cannot achieve uniformity;

2. The price of downloaders is still relatively high, especially the original ones, which is why some microcontrollers have given rise to many third-party downloaders, such as AVR;

3. Downloading usually requires additional operations, such as STC needing to power cycle, and STM32 needing to set the BOOT pin level, etc. These additional operations increase the complexity of programming. Especially when needing to reprogram in product form, such as embedded upgrades, one must open the casing or lead additional signals outside. All of this is very inefficient and unfriendly.

If there were a programming method applicable to any microcontroller:

1. Unified communication method (for example, all using serial);

2. Providing a friendly operating interface (for example, command line);

3. Efficient and fast, with no additional operations, preferably one-click automated programming;

4. Additionally, adding some embedded firmware management functions (such as firmware version management).

This would certainly make our work much more efficient. The Bootloader can achieve all of the above!

About Bootloader

Basic Form of Bootloader

Let’s look at the diagram.

We can see that BL is a program stored in ROM that primarily implements four functions:

1. Obtain the firmware data to be programmed through some means;

2. Write the firmware data into the APP area of the ROM;

3. Jump to the APP area to run the user program that has been programmed;

4. Provide a necessary and friendly human-computer interaction interface during this process.

This explanation may not be clear, so let’s illustrate it with examples.

Two Design Examples of Bootloader

The following two examples illustrate the practical application forms of BL, without delving into specific implementation details, aiming to help everyone understand how BL actually operates.

Serial BL with Shell Command Line

The basic operational logic is as follows:

1. Input the command “program” through a serial terminal like HyperTerminal, SecureCRT, or Xshell;

2. The BL receives the command and starts waiting to receive firmware file data;

3. The serial terminal uses a file data transfer protocol (for example, X/Y/Zmodem protocol) to send the firmware data to the BL;

4. The BL writes the firmware data into the APP area of the ROM;

5. The BL initiates the program in the APP area to run.

More specific illustrations are shown in the figure.

This operational logic may seem simple, but its actual implementation is not easy; we will explore the specific implementation later.

Bootloader that Programs by Inserting SD Card

The basic operational logic is as follows:

1. Copy the firmware to be programmed onto the SD card;

2. Insert the SD card into the slot;

3. The BL detects the SD card insertion and searches for BIN files on the card;

4. The BL reads the BIN file data and writes it into the APP area of the ROM;

5. The BL initiates the program in the APP area to run.

As shown in the figure.

Through these two design examples, everyone should now understand what BL is. Have you felt that BL is a more universal, flexible, user-friendly, and powerful means of firmware programming and management than ISP programmers?

Some may know about Uboot under Linux, which is a powerful BL that provides very strong flashing (programming operating system images) capabilities and a complete and flexible Shell interface, as shown in the figure. In fact, the BIOS of our computers is also a form of a broad sense of BL.

So how do we implement a BL? Don’t worry, there are some basic requirements that need to be met to implement a BL.

Key Points for Implementing BL

First, it’s important to note that not every microcontroller can implement a BL; several key points must be met.

Chip Architecture Support

Let’s look at the diagram.

We know that the beginning of a microcontroller program is the interrupt vector table, which contains the stack top address and the reset program entry point, allowing the program to run. It is evident that when jumping from BL to APP, the APP program must have its interrupt vector, and the microcontroller architecture must allow for the redirection of the interrupt vector table.

The traditional 51 microcontroller’s interrupt vector table only allows placement at the beginning of the ROM and cannot have an offset, thus traditional 51 microcontrollers cannot support BL.

Someone might ask, “Isn’t this contradictory? You previously mentioned that the STC 51 microcontroller supports serial ISP, so it should have an internal ISP program, which I understand to be similar to BL.”

That’s correct; the built-in ISP program is a form of BL. The reason STC can implement BL functionality is that Holtek Semiconductor improved its hardware architecture; see the figure.

We can see that the STC51 microcontroller has an additional ROM specifically for storing the BL, called BOOTROM.

There is a user online named shaoziyang who wrote a BL for AVR microcontrollers and developed a host computer called AVRUBD, as shown. (AVRUBD is very useful as it allows us to program wirelessly), implementing serial programming for AVR microcontrollers, enabling many people to break free from dependency on USB ISP and other ISP downloaders (although ISP downloaders are already very convenient, they still require money).

AVR’s hardware architecture is similar to that of STC51, as shown in the figure.

By configuring the AVR’s fuse bits, we can control the reset entry address and the size and starting address of the BOOT area, as shown in the figure.

At this point, someone might say, “Is there a microcontroller that can run programs stored in any position in ROM, meaning the interrupt vector table can be relocated?”

Of course there are; many such microcontrollers exist, with the most typical being the STM32. The reason its programs can run anywhere is that its NVIC controller provides configuration for the interrupt vector table offset, which we will discuss in detail later.

ROM Must Support IAP

This also requires hardware support from the microcontroller. It’s easy to understand; after the BL obtains the firmware data, it needs to write it into the APP area of the ROM. Thus, the microcontroller must support IAP operations, which stands for In-Application Programming, meaning programming while the application is running. This allows for erasing and programming operations on its ROM during program execution.

If you think about it carefully, it seems that microcontrollers that support serial ISP also support IAP functionality. STC has packaged this feature as a major characteristic, allowing the internal ROM to serve as EEPROM functionality, enabling the recording of parameters that do not lose power during operation.

STM32’s ROM erasing is implemented in the accompanying firmware library (standard library or HAL library), which you can refer to or use directly.

APP Program Matching Modifications

To ensure that the BL can successfully initiate the APP program, the APP program needs to be modified accordingly during development. The most important aspect is the starting address of the APP program (i.e., the starting address of the interrupt vector table) and the corresponding configuration of the interrupt controller.

For microcontrollers like 51 and AVR, the APP program does not need modification; the reasons should be clear to everyone. Here, we will detail how to modify the STM32 APP program.

We will still use examples; please see the figure.

Assuming the total ROM size of the STM32 we are using is 128KB, with the BL program size being 16K, and the APP program adjacent to the BL, the starting address of the APP area would be 0X08004000, meaning the offset address of the APP program’s interrupt vector table would be 0X4000.

If we use MDK as the development environment, we need to modify this as shown in the figure.

If we are using gcc, we need to modify the link.ld linking file as shown in figure 3.18.

Next, we also need to configure the relevant parameters of the NVIC’s interrupt vector table, mainly the offset of the interrupt vector table, as shown in the code.

#define VECT_TAB_OFFSET  0x4000

OK, after modifying the program, we place it at the starting address of ROM at 0X08004000, and then let the BL jump to this address, and our program will run.

Someone might ask, “How do we write the jump code in the BL?” Don’t worry; this is the next key point we will discuss.

Jump Code in BL

The jump code is a key point in BL, directly related to whether the APP program can run normally, as shown in the figure.

I will directly provide the jump_app function code for STM32.

typedef void (*iapfun)(void);

iapfun jump2app; 

void MSR_MSP(u32 addr)
{
  __ASM volatile("MSR MSP, r0");    //set Main Stack value*
  __ASM volatile("BX r14");*
}


void load_app(u32 appxaddr)
{
  if(((*(vu32*)appxaddr)&amp;0x2FFE0000)==0x20000000)//**check if stack top address is valid*
  {

    //**User code area second word is program start address (**reset address)
    jump2app=(iapfun)*(vu32*)(appxaddr+4);

    //**Initialize APP** stack pointer (**first word of user code area stores stack top address)
    MSR_MSP(*(vu32*)appxaddr);  

    jump2app();   //**Jump to APP.
  }
}

This code is for everyone to study; explaining it further would be redundant.

At this point, we have covered the key points related to BL, and everyone should have the ability to create a simple BL. Based on STM32, here is a small experiment you might be interested in trying out, as shown in the figure.

We will program the BL using Jlink at the 0X08000000 position, and the APP program will be programmed starting from 0X08002000. After resetting, if the serial prints “hello world” or the indicator light turns on, it means our BL was successful.

Innovative Uses of Bootloader

The content I have presented above covers the most basic aspects of BL that are essential for us to understand. The true highlights of BL lie in the diverse methods of obtaining firmware data.

Implementation and Extension of BL (Firmware Transfer via Serial)

Earlier, I mentioned two applications of BL: one is transferring firmware files via serial, and the other is copying firmware files from an SD card. These are two forms of BL frequently used in practical engineering.

Here, I will focus on the implementation details of the former example, as it is very typical, as shown in the figure.

This flowchart raises three questions:

How is the serial communication protocol implemented?
Why is the firmware data received from the host computer not written directly to the APP area but temporarily stored and verified first?
How is the firmware data verified?

For the first question, the details of implementing the serial communication protocol and file transfer are somewhat complicated and will be explained in a future article.

For the second question: The firmware data received by the microcontroller through serial transfer may contain errors, and if erroneous firmware is written directly to the APP area, it will not run. Therefore, we need to temporarily store each frame of data, and after all data is transmitted, perform an overall verification to ensure the absolute correctness of the firmware data.

Regarding the third question, we need to focus on this.

How can we determine if a file has errors during transmission from the sender to the receiver?

The usual practice is to add a checksum to the file; the receiver calculates the checksum using the same method and compares it with the checksum in the file. If they match, the transmission is deemed correct, as shown in the figure.

The above image illustrates the padding of the firmware file and the addition of the checksum.

Why do we need to pad the file? Embedded programs generated by cross-compilation into burnable files, such as BIN, are often not multiples of 128, 256, 512, or 1024. This can lead to the last frame of data being insufficient in length during transmission, resulting in a data tail.

Padded rounding is the most direct method to resolve data tails. This operation is done on the host computer, usually by writing a small software to achieve it. This small software will also append the checksum to the end of the firmware file. This checksum can use a checksum or CRC, typically 16-bit or 32-bit, as shown, achieved by a small software that pads the firmware file and adds the checksum.

Someone might ask: “To temporarily store the entire firmware, doesn’t that require additional storage space, like external ROM (FlashROM or EEPROM)?

Yes. If we want to save costs, we can also program directly into the APP area without temporary storage. This is risky, but generally, it is not a big issue (STC and STM32’s serial ISP are actually real-time programming without temporary storage).

During the transmission process, the transmission protocol provides a certain guarantee for data correctness; it checks each frame of data, and if it fails, there will be retransmissions. Continuous failures may directly terminate the transmission. Therefore, as long as the transmission completes, the data’s correctness is generally assured.

However, it is still advisable to perform an overall verification of the firmware, and if cost allows, to slightly increase the ROM capacity. Additionally, firmware temporary storage has another benefit: if the firmware in the APP area is damaged, such as accidentally losing the firmware or erasing the APP area during IAP, we can recover it from the temporarily stored firmware (a complete BL will include firmware recovery functionality).

In fact, we do not necessarily need to expand the ROM; if the firmware size is relatively small, we can split the microcontroller’s on-chip ROM in half for use, using the latter half for firmware temporary storage. We can divide the on-chip ROM into three parts:

We divide the on-chip ROM into three parts, used for storing BL, APP firmware, and temporarily storing firmware. For example, we use STM32F103RBT6, which has a total ROM capacity of 128KB, divided into 16K/56K/56K.

Some products are extremely cost-sensitive. I have had such development experiences; the microcontroller used was STM32F103C8T6, with a total on-chip ROM capacity of 64K and a firmware size of 48K, with the BL being 12K. During firmware programming using BL, there was no extra ROM for temporary storage. I employed a trick called “tailing the dog” as shown in the figure.

I accidentally discovered that the STM32F103C8T6 and RBT6 chips are the same. However, some chips with 64KB of ROM have poor performance or defects, and thus are restricted from use. I tested this, and it was indeed the case.

However, using the latter 64K ROM has prerequisites; it must first be verified for defects. If it is good, we temporarily store it for verification before writing to the APP area; if it is defective, we write it directly to the APP area during firmware transmission (this method has worked for me many times without discovering defects in the latter 64K).

The above method introduced by Zhenan is a kind of “trick operation” that inherently carries some risk. ST officially states that it does not guarantee the quality of the latter 64K ROM, so it should be used cautiously.

Wireless Programming within 10 Meters

This “wireless programming” originated from one of my IoT projects, which involved monitoring the operational status of the air conditioner’s outdoor unit. As you know, installing the outdoor unit of an air conditioner is not something just anyone can do; it is either on the roof or in a window. This presents significant challenges for hardware upgrades of embedded programs.

Therefore, I implemented the “wireless programming” function, which is essentially an extension of serial BL application, as shown in the figure, using a Bluetooth serial module to achieve “wireless programming”.

“Wireless programming is indeed powerful, but you still have to carry a computer, which is not very convenient.” Indeed! Do you remember that I mentioned the AVRUBD communication protocol earlier? Its host computer software has a mobile version. So as long as we have a phone, we can perform “wireless programming,” as shown, with the phone connecting to the Bluetooth serial module to achieve “phone wireless programming”.

“Which app? Quickly tell me the name,” don’t rush; the Android version of the Bluetooth serial assistant is shown below, and it is currently transmitting firmware.

AVRUBD is actually an improvement of the Xmodem protocol; we will discuss this in detail in a dedicated article.

Distributed Programming of BL

We know that the core function of BL is essentially program programming. Have you ever encountered a more complex situation, as shown in the figure, where a system (product) has multiple components that need firmware programming?

This situation is possible. A typical complex system architecture includes a main MCU + CPLD + communication coprocessor + acquisition coprocessor. During mass production of such products, programming is very tedious. First, multiple firmwares need to be maintained, and then each component must be programmed individually, which may also involve different programming methods. Therefore, I introduced a mechanism called “distributed programming of BL.”

First, we assemble all the firmwares into a large firmware (concatenating the data sequentially) and pre-program this large firmware into external ROM, such as SPI Flash; then we pre-program the main MCU with the BL; and then perform SMT soldering.

Once the PCBA is produced, as soon as it is powered on for the first time, the BL will read the large firmware from the external ROM and separate each small firmware, programming them into the respective components through their corresponding interfaces. Coupled with the test commands from the test fixture, it directly performs self-checking.

Doing this makes mass production highly efficient. Of course, developing this BL will also have some difficulty, with the biggest problem possibly being the implementation of programming interfaces for each component (some components have relatively complex programming protocols, such as STM32’s SWD or ESP8266’s SLIP).

There is no best BL, only the most suitable one for oneself. Generally speaking, we do not design BL to be very complicated; in principle, it should be as short and concise as possible to save more ROM space for the APP area. After all, the APP is the protagonist of the product.

Unconventional BL

Bootpatcher

Let me ask everyone a question: “Does the Bootloader have to be located before the APP area in ROM?” Clearly, it does not; AVR is the best example. But what if we limit it to STM32? It seems so. The power-on reset must start running from the 0X08000000 position, and the BL must run before the APP.

In some special cases, if the APP must be placed at the 0X08000000 position, is there still a way to implement serial programming with BL? Please note that when the APP is running, it cannot IAP its own program memory (that is, it cannot erase its own memory to reprogram new firmware). Please see the figure; the BL located after the APP is referred to as Bootpatcher.

When the APP wants to reprogram itself, it can directly jump to the subsequent BL; once the BL runs, it starts receiving firmware files, temporarily stores them, verifies them, and then writes the firmware to the previous APP area. It then jumps to 0X08000000 or directly restarts. Thus, the new APP will run.

This BL located behind the APP is referred to as Bootpatcher (meaning launch patch). However, this method carries risks; if the APP area programming fails, the product becomes a brick. Therefore, this method is generally not used.

APP Reprogramming BL

Previously, we discussed BL programming the APP; but what if the BL needs upgrading? Using JLINK is one option. Correct, but there is a more direct method, as shown, where the APP programs the BL area.

This is a reverse-thinking approach; we implement receiving firmware files in the APP program, temporarily store them, and then program them into the BL area. This method is similar to Bootpatcher and carries certain risks, but generally, it is not problematic.

Conclusion

OK, this series of articles provides a thorough analysis of BL, aiming to be both insightful and comprehensive, covering basic principles, practical implementations, and some knowledge extensions. I hope it will inspire everyone to apply this knowledge in their actual work.

Author/Source: Learn Embedded Together

END

Complete Guide to Microcontroller Bootloader

Follow the Breadboard Community’s Public Account

Reply: Analog Electronics 120

Download Free【120 Examples of Analog Electronic Simulation Experiments】