Methods to Improve the Anti-Interference Capability of Embedded Systems

Follow usLearn Embedded Systems Together, learn and grow together

While improving the anti-interference capability of hardware systems, software anti-interference is receiving increasing attention due to its flexible design, resource-saving nature, and good reliability.

Below, we will study software anti-interference methods using microcontroller systems as an example.

1. Research on Software Anti-Interference Methods

In engineering practice, the content of software anti-interference research mainly includes:

Eliminating noise from analog input signals (such as digital filtering technology);
Methods to bring the program back on track when the program runs chaotically.

This article proposes several effective software anti-interference methods for the latter case.

1. Instruction Redundancy

The CPU fetches instructions by first retrieving the opcode, followed by the operand. When the PC is disturbed and an error occurs, the program deviates from its normal track. If it lands on a double-byte instruction and the instruction fetch time falls on the operand, the operand may be misinterpreted as the opcode, causing the program to fail. The error probability increases if it lands on a three-byte instruction.

In critical locations, inserting some single-byte instructions artificially or rewriting effective single-byte instructions as instruction redundancy is common. Typically, two or more NOPs are inserted after double-byte and triple-byte instructions.

This way, even if the chaotic program flies to the operand, the presence of the NOP instruction prevents subsequent instructions from being executed as operands, allowing the program to automatically return to the correct track.

Additionally, for instructions that play crucial roles in system flow, such as RET, RETI, LCALL, LJMP, JC, etc., inserting two NOPs before these instructions can also help bring chaotic programs back on track, ensuring the execution of these important instructions.

2. Interception Technology

Interception refers to guiding the chaotic program to a designated location for error handling. Software traps are commonly used to intercept chaotic programs, so it is essential to design traps reasonably and place them appropriately.

a. Design of Software TrapsWhen a chaotic program enters a non-program area, redundant instructions become ineffective. By using software traps, we can intercept the chaotic program, direct it to a designated location, and then perform error handling.

Software traps refer to instructions that guide captured chaotic programs to the reset entry address 0000H. Typically, the following instructions are filled in the non-program area of EPROM as software traps:

NOP NOP LJMP 0000H  Its machine code is 0000020000.

b. Arrangement of Traps

Usually, unused EPROM space in the program is filled with 0000020000. The last instruction should be 020000, so when the chaotic program falls into this area, it can automatically return to the correct track. Trap instructions can also be filled in the empty units between various modules in the user program area.

When the interrupts used are opened due to interference, setting software traps in the corresponding interrupt service routines can promptly capture erroneous interrupts. For instance, if a certain application system does not utilize external interrupt 1, the interrupt service routine for external interrupt 1 may look like this:

NOPNOPRETI The return instruction can be either “RETI” or “LJMP0000H”. If the fault diagnosis program and system self-recovery program are designed reliably and thoroughly, using “LJMP0000H” as the return instruction can directly enter the fault diagnosis program, addressing the fault and restoring program operation as early as possible.

Considering the capacity of the program memory, generally, 2-3 software traps in 1K space can effectively intercept errors.

3. Software ‘Watchdog’ Technology

If a runaway program enters a “dead loop,” the “watchdog” technology is usually employed to extricate the program from the “dead loop.” By continuously monitoring the program’s loop run time, if the loop time exceeds the maximum allowed time, it is deemed that the system has entered a “dead loop,” necessitating error handling.

Methods to Improve the Anti-Interference Capability of Embedded Systems

“Watchdog” technology can be implemented in hardware or software. In industrial applications, severe interference can sometimes damage the interrupt control word, disabling interrupts. In such cases, the system cannot timely “feed the dog,” resulting in hardware watchdog circuit failure. However, software watchdogs can effectively resolve this issue.

2. System Fault Handling and Self-Recovery Program Design

Microcontroller systems experience abnormal resets due to interference or power outages, necessitating fault diagnosis and the ability to automatically restore the state prior to the abnormal reset.

1. Identification of Abnormal ResetsThe execution of the program always starts from 0000H, leading to four possible scenarios for starting execution from 0000H:

System power-on reset;
Software fault reset;
Hardware reset due to watchdog timeout;
Power loss during task execution followed by power-on reset.

Except for the first scenario, the other three are considered abnormal resets and need to be identified.

Identification of Hardware and Software Resets:

Here, hardware reset refers to power-on reset and watchdog reset, which affect registers. For instance, after a reset, PC=0000H, SP=07H, PSW=00H, etc. However, software resets do not affect SP or PSW.

Therefore, for microcomputer measurement and control systems, when the program is running normally, set the SP to an address greater than 07H, or set the 5th bit of PSW’s user flag to 1 during normal operation. Thus, when the system resets, checking the PSW.5 flag or SP value will determine whether it is a hardware reset.

Methods to Improve the Anti-Interference Capability of Embedded Systems

Since the internal RAM state is random during a hardware reset, while the internal RAM can retain its state during a software reset, one can select one or two internal units as power-on flags.

Set 40H to serve as the power-on flag, with the flag value being 78H. If the content of the 40H unit is not equal to 78H after the system reset, it is deemed a hardware reset; otherwise, it is a software reset, leading to error handling. Using two units as power-on flags increases the reliability of this identification method.

Identification of Power-On Reset and Watchdog Fault Reset:

Power-on reset and watchdog fault reset are both hardware resets, so to correctly identify them, one generally needs to use non-volatile RAM or EEROM.

When the system is running normally, set an observation unit with power-off protection. During normal operation, in the interrupt service routine for timely feeding the watchdog, maintain the observation unit at a normal value (set to AAH), while in the main program, clear this unit. Since the observation unit is protected from power loss, checking whether this unit holds a normal value at power-on can help determine whether it was a watchdog reset.

Identification of Normal Power-On Reset and Abnormal Power-On Reset:

Identifying power-on resets caused by unexpected situations such as power loss is particularly important for process control systems.

For example, in a time-controlled measurement and control system, completing one control task may take one hour. If the system voltage becomes abnormal and resets after 50 minutes of control execution, restarting from scratch after the reset would lead to unnecessary time consumption.

Thus, a monitoring unit can be used to monitor the current system operation state and system time, breaking the control process into several steps or time periods. After completing each step or running for a specific time period, the monitoring unit can be set to a shutdown allowance value. Different tasks or task stages have different values. If the system is performing a control task or running within a specific time period, the monitoring unit can be set to a non-normal shutdown value. After the system resets, the original operation state can be determined based on this unit, leading to error handling to restore the system’s previous state.

2. Program Design for Self-Recovery After Abnormal Resets

For process control systems with strict sequence requirements, it is generally required to restore operation from the module or task where the system experienced an abnormal reset. Therefore, the measurement and control system should back up important data units and parameters, such as system operation state, process values, current input/output values, current clock values, observation unit values, etc. These data should be backed up regularly, and any modifications should be backed up immediately.

When a system abnormal reset is identified, some necessary system data must first be restored, such as initializing the display module, initializing external expansion chips, etc. Next, the system state and operating parameters of the measurement and control system should be restored, including the restoration of the display interface. After that, the tasks, parameters, and operating time prior to the reset should be restored before entering the system operation state.

It should be noted that accurately restoring the system’s operation state requires meticulous backup of important data and data reliability checks to ensure the reliability of the restored data. Furthermore, for multi-task and multi-process measurement and control systems, the order of data recovery must be considered.

The basic system initialization refers to initializing the chip, display, input/output methods, etc. Care should be taken to ensure that the input/output initialization does not cause erroneous actions. The initialization of tasks prior to reset refers to the execution state and running time of those tasks.

Other commonly used methods for software anti-interference include digital filtering, RAM data protection, and error correction. In engineering practice, a combination of several anti-interference methods is typically used to complement each other for better anti-interference results.

Fundamentally, hardware anti-interference is proactive, while software anti-interference is reactive. A thorough analysis of interference sources, combining hardware and software anti-interference, perfecting system monitoring programs, and designing a stable and reliable microcontroller system is entirely feasible.

Source: Internet, copyright belongs to the original author. If there is any infringement, please contact to delete.

Follow 【Learn Embedded Systems Together】, reply “Join Group“ to enter the technical exchange group.

If you find the article good, click “Share“, “Like“, or “See“!

Follow usLearn Embedded Systems Together, learn and grow together

1. Research on Software Anti-Interference Methods

2. System Fault Handling and Self-Recovery Program Design

Related posts

Leave a Comment Cancel reply