Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog…

Follow, Star MarkEmbedded Inn, delivering valuable content in a timely manner

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

[Guide] With so many microcontroller programs written, and the watchdog dog seen every day, are you raising your dog correctly? Just keep feeding the dog, and as long as it doesn’t bark, everything is fine, right? Is it really that simple? In fact, it may not be as straightforward as you think…..

What is a Watchdog?

A watchdog, also known as a watchdog timer, is essentially a timing circuit or software timer mechanism.

Working Principle:

The hardware basis of a watchdog is a counter that is set to a certain initial timing value and then decrements to zero. The software is responsible for frequently resetting the counter to its initial timing value to ensure that the count never reaches zero. If it does reach zero, it indicates that some fault has occurred, and corresponding measures must be taken, such as restarting or entering a fail-safe state, depending on the system’s design.

During normal operation, the microcontroller, processor, or thread periodically resets the watchdog timer’s timing value, while the timer continuously counts in the background. If the timing period expires without being fed again, the dog barks, indicating that something unusual has happened! At this point, the dog issues commands externally to execute corresponding actions. What these actions are depends on the actual system design. Common watchdog chips will send a reset signal to the microcontroller or processor, while for software timers, the specific actions can vary widely, depending on the safety strategy employed.

In simple terms, this is also called feeding the dog; the timing value is equivalent to dog food. The dog consumes the food continuously, and if it is not fed before it runs out, the dog will bark, sending out a warning message. Conversely, a system that operates normally will always have its watchdog well-fed and will not bark in hunger.

Note: I have seen articles refer to resetting the watchdog timer as kicking the dog. Well, that’s not very nice; we should treat the dog better and call it feeding instead~~~

The watchdog mechanism plays a crucial role in electronic systems. For example, if the Mars rover’s program hangs, it would be equivalent to losing communication if there were no watchdog circuit. Imagine the scenario: unable to communicate or wake up, it would quickly become space debris~~~

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

What Errors Can It Monitor?

  • Stack or heap overflow, causing the program to crash
  • A certain segment of the program fails to return or enters an infinite loop
  • Strong electromagnetic interference damaging data, leading to system anomalies; you might not understand this, but think of many electronic systems in military or aerospace fields that often operate in strong electromagnetic interference environments
  • System crashes caused by bugs
  • Deadlocks in multitasking systems
  • ……

There are countless reasons, but don’t panic! You have a good watchdog to help you; let the watchdog clean up the mess. In a complex embedded system, it is impossible to guarantee that there are no bugs, but by using a watchdog, you can ensure that no bug will indefinitely hang the system.

What to Do When the Dog Barks?

What are the common handling strategies?

  • System Reset: Most people have experienced this; when the system hangs, what do you do? Restart. It reminds me of Liu Huan’s song <<Starting Over>>; how nice it would be if life could be restarted, but it can’t! If you’re interested, give it a listen~~~
  • Fail-Safe: This is often referred to as fail-safe mode. It means that even if the device experiences a fatal failure, it should not cause a safety incident. To put it bluntly, even if it crashes, it should not affect others. This can be hard to understand; for example, in a descending elevator, if the watchdog detects a program anomaly, the safe approach is to stop the motor immediately; otherwise, it would free-fall, leading to disaster. This is reflected in IEC61508 functional safety standards, as well as medical and automotive safety standards.
  • Here is a recommended practice: after the chip resets, use the chip’s reset status register value to count watchdog reset events. If this happens three times in a row, the conservative approach is to switch the system to a safe state or display an error message, thus avoiding infinite restarts. How to do this? Taking IAR as an example, you can define a variable to prevent the system from automatically initializing (in IAR, this is called __no_init), allowing for counting; after a reset, its value remains preserved unless power is cut. __no_init int wdtResetCounter;
  • ….depends on the specific design strategy

If we want the system to recover quickly, we should adopt a strategy where the initialization after a watchdog reset is shorter than the normal power-on initialization. This means skipping some self-checks of the device. However, in some systems, it is best to perform a comprehensive self-check, as the root cause of the watchdog timeout may be due to such hardware anomalies.

How to Feed the Dog Specifically?

For bare-metal programs, I recommend the following two handling strategies: fault detection feeding and enhanced fault detection feeding.

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

Fault Detection Feeding

For a bare-metal microcontroller program, you can detect some critical runtime states while feeding the watchdog, such as stack depth, buffer status, and hardware of critical function chains (like sensors, actuators, etc.). If these states are abnormal, record the error state and place the device in a functional safety state.

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

Enhanced Fault Detection Feeding

What is sequence detection feeding? There is a paradigm in IEC-61508 called sequence check, which might sound a bit strange; just look at the diagram, and you’ll understand immediately.

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

This involves setting a sequence marker for the main function’s key functional blocks. If the sequence is incorrect, perform safe fault handling; if correct, continue executing the next block. When feeding the watchdog, check if the sequence is correct; if it is, feed it; otherwise, perform error handling, or simply letting the dog bark is also an option.

For multitasking real-time systems, there are some different requirements:

  • Detect whether the operating system is running correctly
  • Detect if there are infinite loops in all tasks
  • Detect deadlocks involving two or more tasks
  • Detect if certain low-priority tasks cannot run due to high-priority tasks occupying the CPU
  • ….

Mother Dog with Puppies Feeding Method

This name sounds a bit crude, haha. To make it easier to understand, let’s call it that; let’s look at a diagram first:

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

Implementation Strategy Description:

The watchdogTask can be seen as a doghouse, housing a group of dogs, where the hardware watchdog is the mother dog, and the sub-task software watchdogs are the puppies. Each sub-task needs to feed the dog once in each loop cycle (of course, in actual implementation, task fault detection feeding can also be included). In each loop of the watchdogTask, it decrements all software watchdogs; if any overflow occurs, the soft dog barks, requiring exception handling (reset or enter fail-safe mode). If all software dogs do not overflow, then feed the hardware watchdog (which may be an internal or external chip of the microcontroller).

In actual implementation, attention must be paid to:

  • watchdogTask should be selected with the highest priority
  • Each loop should call os_delay for a certain time to give CPU time to other tasks. The suspended time should be less than the maximum hardware watchdog timeout.
  • Tasks’ priorities should be reasonably arranged
  • Feeding the dog in interrupt handling and other functions is strictly prohibited.

How Long Should the Dog Bark?

The Pain of Being Too Short

If the watchdog timer’s timing is set too short, the system may easily misjudge, leading to frequent resets or entering fail-safe mode. The quality of any safety chain depends on its weakest link; if a timeout interval is chosen too short, the firmware’s loop time is dynamic, especially when there are many external asynchronous events or nested interrupts, the fluctuations can be significant, so the worst-case scenario must be considered: how long does it take for the system to loop once.

The Harm of Being Too Long

One method is to choose a timeout interval of several seconds. When you only want to reset a truly hung system but do not wish to conduct a detailed study of the system’s timing, this strategy can be employed. It is a robust method. However, some systems require quick recovery, which can lead to slow fault diagnosis, especially in high-safety-demand situations, such as nuclear power systems, automotive electronic systems, and medical device systems.

Therefore, in actual design, it is necessary to consider the worst-case scenario and try to choose a relatively short timeout duration, seeking a balance between the two.

In Summary

For microcontroller programming, the watchdog strategy has extensive applications even in embedded Linux and databases. How to use the watchdog reasonably is a very important topic for designing a robust electronic system.

Original content is not easy to create. If you find this article valuable, please click to read again or share it with your friends, so more people can see it.

END

For previous exciting recommendations, click to read▲Learn Linux Drivers: Understanding the Bus Driver Model FirstLearning AI: Concepts of Machine LearningHands-on Series: IIR Digital Filter Design and Implementation

Microcontroller Programming: The Soul-Searching Question of Feeding the Watchdog...

Leave a Comment