Dead Loop Hidden in Assembly Code

Dead Loop Hidden in Assembly Code
New User Registration for June Breadboard Community
Get a Free IoT Learning Package
The package includes:
1. 500 copies of “RT-Thread Device Driver Development Guide”
2. 16 lectures of RT-Thread online teaching courses
3. 10GB of IoT data downloads

πŸ‘‡ Scan to register and receive the package πŸ‘‡

Dead Loop Hidden in Assembly Code

1 Introduction
In my previous article, I discussed the importance of mastering assembly knowledge, which can save you from difficult situations at critical times.
In this article, I will introduce another method for troubleshooting using assembly knowledge, hoping to help everyone.
2 Problem Description
The problem is as follows: some time ago, our project team discovered during a self-test that our code seemed to hang: the command line could not accept input, but no reset information was output.
At that time, one of our team members said, “It seems like our system is hung?” After I learned about this phenomenon, based on my previous troubleshooting experience, I immediately concluded, “Maybe our code has run into a dead loop, let’s check it carefully!”
So we started debugging the code, adding some necessary debugging information, and eventually found that there was a checksum calculation function that got called but never returned, and this checksum function is very simple, it looks like this:
uint16_t checksum(uint8_t *data, uint8_t len){    uint8_t i;    uint16_t sum = 0, res;    for (i = 0; i < len; i++) {        sum += data[i];    }    res = sum ;    return res;}
I imagine when you see this function, you must be thinking: “Wow, isn’t this just calculating the cumulative checksum? How could it possibly hang?”
Indeed, that was the scene of our debate at that time!
3 Simple Analysis
This checksum function is really very simple, with simple parameters, simple implementation, and simple return value, there is no difficulty at all.
Let’s analyze it step by step:
Since the code did not crash, it proves that the data pointer must be non-NULL, so there should be no problem;
However, this len is somewhat suspicious. The type of len is uint8_t, which is unsigned, and its range is 0-255; but what if a value of -1 is passed in from outside?
If -1 is passed in, when forcibly converted to uint8_t, its value is also 255, so the for loop below will still run 256 times, it must exit, right?
Is it possible that during the execution of the for loop, the stack value was modified, causing both i and len to change, thus altering the number of iterations of the for loop?
So we started printing the values of i and len, and found that both values were changing normally, not as we had just thought.
This is very strange!!!
If this for loop is to “infinitely” loop, causing a “dead loop”, the condition that must be met is that len is very large, but isn’t len of type uint8_t? The maximum is only 255, right?
Using printf again, the result was beyond our expectations, see:
Dead Loop Hidden in Assembly Code
Log output:
[12-21 19:45:38]checksum 128 len: 4294967295[12-21 19:45:38]0 4294967295[12-21 19:45:38]1 4294967295[12-21 19:45:38]2 4294967295[12-21 19:45:38]3 4294967295[12-21 19:45:38]4 4294967295[12-21 19:45:38]5 4294967295[12-21 19:45:38]6 4294967295[12-21 19:45:38]7 4294967295[12-21 19:45:38]8 4294967295[12-21 19:45:38]9 4294967295[12-21 19:45:38]10 4294967295    γ€‚γ€‚γ€‚ηœη•₯很倚[12-21 19:45:38]250 4294967295[12-21 19:45:38]251 4294967295[12-21 19:45:38]252 4294967295[12-21 19:45:38]253 4294967295[12-21 19:45:38]254 4294967295[12-21 19:45:38]255 4294967295[12-21 19:45:38]256 4294967295[12-21 19:45:38]257 4294967295[12-21 19:45:38]258 4294967295[12-21 19:45:38]259 4294967295[12-21 19:45:38]260 4294967295
。。。still printing continuously
Seeing this, does it seem a bit clearer? The value of len is 4294967295?
This value is not 0xFFFFFFFF, right?
We also printed len using %d and found the value was -1.
Looking back at the place where checksum was called:
uint16_t res = checksum(&data[0], len - 1);
It seems the truth is out; when len is 0, the value passed in is -1, right?
It seems to be the case, but when -1 goes in, it is of type uint8_t, at most it should be 255, right? How did it become 4294967295?
At the same time, we also discovered the key issue: this is not a true “dead loop” in the real sense, but rather the for loop is executing for too long, causing it to take a long time to finish. After all, our main frequency is only 160MHZ, and the CPU has to run from 1 to 0xFFFFFFFF, which takes quite a long time!
4 Scenario Recreation
To fully illustrate this problem, I will try to restore the code scenario we had at that time:
/* A struct definition, don't rush to criticize its definition, this code is open source, grievances have their heads... Also, don't doubt the issue of byte alignment, I once suspected it, but when I found out the truth, I was slapped in the face! */typedef struct _data_t {    /* result, final result */    uint8_t len;    uint8_t flag;    uint8_t passwd_len;     uint8_t *passwd;     uint8_t ssid_len;     uint8_t *ssid;     uint8_t token_len;     uint8_t *token;     uint8_t bssid_type_len;     uint8_t *bssid;     uint8_t ssid_is_gbk;    uint8_t ssid_auto_complete_disable;    uint8_t data[127];    uint8_t checksum;} data_t;
/* 1.c calling the checksum C file */ /* Define global data */static data_t g_data; /* Set global data */void set_global_data(void){    g_data.len = 0;}void handle_global_data(void){      uint16_t res = checksum(&g_data.data[0], g_data.len - 0);  //sometimes no return from checksum}void test_func_entry(void){    set_global_data();    handle_global_data();}
/* 2.c defining the utility class for the checksum function */uint16_t checksum(uint8_t *data, uint8_t len){    uint8_t i;    uint16_t sum = 0, res;    for (i = 0; i < len; i++) {        sum += data[i];    }    res = sum ;    return res;}
In my first understanding, it was still the case where len=-1=255. Due to g_data.data having only 127 bytes, but it could eventually access index 255, which itself has an issue of illegal data access; however, after careful argument, the conclusion is that this does not lead to a dead loop, or rather does not change the value of len; because the checksum only reads the value of the data pointer and does not change its value. Even if it goes out of bounds, it at most accesses someone else’s data and does not cause any exceptions (at least on our processor platform).
This problem was truly perplexing to us, and to mitigate this problem, we made a check when calling checksum, and simply skipped the call when len was 0, thus bypassing the issue.
However, as a developer who digs deep into underlying logic, we should not overlook such details; perhaps there are still some potential risks we have not discovered yet?
This problem has been bothering me, and from time to time when I have free time, I think about what other situations could lead to this phenomenon?
5 A Ray of Hope
One day, while browsing an article about compiler code optimization on Zhihu, I saw an important clue:
Dead Loop Hidden in Assembly Code
Suddenly, a question flashed through my mind: “Could it be that the for loop in the checksum function is precisely because the caller did not declare the checksum function, meaning the corresponding header file was not included, leading the compiler to make default assumptions?”
We all know that when using the gcc compiler to compile C code, if a function is called without declaration, it will issue a warning: “warning: implicit declaration of function ‘checksum’ [-Wimplicit-function-declaration]!”
Moreover, especially when the compiler does not know the prototype of the called function, it can only rely on your calling code combined with some default values to make assumptions:
For example, our calling code is:
uint16_t res = checksum(&g_data.data[0], g_data.len - 0);
Here, I speculate that the behavior of the compiler is that you have a function named checksum, but I cannot find its prototype, so I will assume that “the return value is of type uint16_t, the first parameter is of type int, and the second parameter is also of type int”!
Why does gcc default the parameter list to int type? This is my hypothetical guess, and we will further verify whether this is the case.
With this hypothesis in mind, we return to the ARM assembly regarding function call parameters. At this point, R0 should equal &g_data.data[0], and R1 should equal -1.
Since R0/R1 are both 32-bit registers, there is no distinction between signed and unsigned when storing data, and in this issue, R0 has no problem; we only discuss R1.
At this point, the value of R1 register should be “-1 = 0xFFFFFFFF”, this assumption is crucial. If the analysis goes smoothly, then this for loop will keep looping indefinitely, allowing for theoretical continuation.
6 Finding Evidence
Since we have discovered some clues above, we should further find related evidence to prove our idea. Also, if the root of this problem lies in the include header file, then when we add the header file, this problem should not recur. Let’s see if this is the case:
6.1 Is it really a warning?
Due to our code having too many warnings, it belongs to the kind of 0 error N warnings, and it often takes a lot of effort to find a warning!
Dead Loop Hidden in Assembly Code
After a thorough search, sure enough, there was indeed a warning: “warning: implicit declaration of function ‘checksum’ [-Wimplicit-function-declaration]!”
6.2 Getting to the Bottom
To understand the behavior of the compiler, we must look at the corresponding assembly file. Here, there are two places we need to look: one is the assembly of the checksum function, and the other is the assembly near the call to the checksum function.
Let’s take a look:
/* Assembly code for the checksum function */    .section    .text.checksum,"ax",%progbits    .align    1    .global    checksum    .code    16    .thumb_func    .type    checksum, %functionchecksum:.LFB4:    .loc 1 125 0    .cfi_startproc    @ args = 0, pretend = 0, frame = 0    @ frame_needed = 0, uses_anonymous_args = 0.LVL27:    push    {r4, r5, r6, lr}    .cfi_def_cfa_offset 16    .cfi_offset 4, -16    .cfi_offset 5, -12    .cfi_offset 6, -8    .cfi_offset 14, -4    .loc 1 125 0    movs    r4, r0    movs    r5, r1  // r1 -> r5 , i.e., the value of len is stored in r5    .loc 1 129 0    movs    r2, r1    ldr    r0, .L29.LVL28:    bl    printf  // Print the value of len.LVL29:    movs    r3, r4    .loc 1 127 0    movs    r0, #0    adds    r5, r4, r5.LVL30:.L26:    .loc 1 130 0 discriminator 1    cmp    r3, r5  // The key judgment in the for loop, i.e., i < len    beq    .L28 // Exit the for loop    .loc 1 131 0 discriminator 3 // Below is the execution body of the for loop    ldrb    r2, [r3]    adds    r3, r3, #1.LVL31:    adds    r0, r0, r2.LVL32:    lsls    r0, r0, #16    lsrs    r0, r0, #16.LVL33:    b    .L26.LVL34:.L28:    .loc 1 136 0    @ sp needed.LVL35:    pop    {r4, r5, r6, pc}.L30:    .align    2.L29:    .word    .LC12    .cfi_endproc.LFE4:    .size    checksum, .-checksum
From its assembly code, it can be seen that how many times the for loop executes depends on the value of register r5, which is the value of len.
Note that in the assembly code here, it is not visible whether r5 is uint8_t or uint32_t; it is merely a 32-bit register.
    .section    .text.verify_checksum,"ax",%progbits    .align    1    .global    verify_checksum    .code    16    .thumb_func    .type    verify_checksum, %functionverify_checksum:.LFB5:    .loc 1 81 0    .cfi_startproc    @ args = 0, pretend = 0, frame = 0    @ frame_needed = 0, uses_anonymous_args = 0.LVL17:    push    {r4, lr}    .cfi_def_cfa_offset 8    .cfi_offset 4, -8    .cfi_offset 14, -4    .loc 1 83 0    ldr    r4, .L20    .loc 1 91 0    @ sp needed    .loc 1 83 0    movs    r0, r4   // r0 stores the address of struct g_data    ldrb    r1, [r4] // Assigns the first byte of g_data, i.e., g_data.len to r1    adds    r0, r0, #34 // r0's address offset by 34 bytes, i.e., offset to g_data.data;    subs    r1, r1, #1  // A key step: r1 = r1 - 1 Since we reproduced the problem, g_data.len is 0, so at this point, the value of r1 is 0xFFFFFFFF    bl    checksum    // Call the checksum function, the first two parameters are r0 and r1.LVL18:    .loc 1 84 0    adds    r4, r4, #160    .loc 1 89 0    ldrb    r3, [r4]    lsls    r0, r0, #24.LVL19:    lsrs    r0, r0, #24    subs    r0, r0, r3    .loc 1 91 0    pop    {r4, pc}.L21:    .align    2.L20:    .word    .LANCHOR4    .cfi_endproc.LFE5:    .size    verify_checksum, .-verify_checksum
From the assembly code, combined with the assembly code of the checksum function, it should be clear that my previous assumption holds: the value of len passed to the checksum function is indeed 0xFFFFFFFF, and when printed using %u, it is 4294967295.
At this point, the culprit has actually been found. Rather than saying it was an arbitrary optimization by the compiler, it is more about the programmer’s lack of rigor in writing code, failing to properly handle this compilation warning.
6.3 Risk Mitigation
Now that we have identified the root cause of the problem, let’s try to mitigate this risk.
The method is actually quite simple: we just need to include the header file where the checksum function is located in the 1.c file calling the checksum function.
After adding it, let’s see what changes occur. It is clear that the assembly code of the checksum function should remain unchanged; it should not change at all.
However, the assembly code for calling checksum has undergone some changes, and at the same time, the compilation warnings have all disappeared.
* Assembly code after adding the header file */        .section    .text.verify_checksum,"ax",%progbits    .align    1    .global    verify_checksum    .code    16    .thumb_func    .type    verify_checksum, %functionverify_checksum:.LFB5:    .loc 1 81 0    .cfi_startproc    @ args = 0, pretend = 0, frame = 0    @ frame_needed = 0, uses_anonymous_args = 0.LVL17:    push    {r4, lr}    .cfi_def_cfa_offset 8    .cfi_offset 4, -8    .cfi_offset 14, -4    .loc 1 83 0    ldr    r4, .L20    .loc 1 91 0    @ sp needed    .loc 1 83 0    movs    r0, r4    ldrb    r1, [r4]    adds    r0, r0, #34    subs    r1, r1, #1   // The same operation as r1 register r1 = r1 - 1    lsls    r1, r1, #24  // Key change!!! r1 = r1 * (2 raised to the power of 24), which is a left shift of 24 bits    lsrs    r1, r1, #24  // Key change!!! r1 = r1 / (2 raised to the power of 24), which is a right shift of 24 bits    bl    checksum.LVL18:    .loc 1 84 0    adds    r4, r4, #160    .loc 1 89 0    ldrb    r3, [r4]    lsls    r0, r0, #24.LVL19:    lsrs    r0, r0, #24    subs    r0, r0, r3    .loc 1 91 0    pop    {r4, pc}.L21:    .align    2.L20:    .word    .LANCHOR4    .cfi_endproc.LFE5:    .size    verify_checksum, .-verify_checksum
For easy comparison, I directly used a comparison tool to display the results:
Dead Loop Hidden in Assembly Code
I checked the two additional instructions: lsls and lsrs, see here.
One is an arithmetic left shift of 24 bits, and the other is an arithmetic right shift of 24 bits, effectively clearing the high 24 bits, so that the value of -1 passed into the checksum becomes only 0x000000FF instead of 0xFFFFFFFF.
This brings uint8_t len back to normal logic, and naturally, the previous for loop will not run indefinitely.
7 Extensions
The scenario I mentioned above corresponds to the ARM platform. Since our code is cross-platform, supporting RISC-V architecture, X86 architecture, etc.
7.1 RISC-V Architecture
So let’s compare the situation under the RISC-V architecture:
From this perspective, the RISC-V handling is quite brutal, just one addi instruction to remove the high 24 bits!
7.2 80×86 Architecture
I pushed a simple project code to GitHub to reproduce this problem; if interested, you can check it out here.
Unfortunately, this problem did not recur on 80×86.
The core difference in the code is whether to include 2.h:
Dead Loop Hidden in Assembly Code
Indeed, there are differences in the assembly code:
Dead Loop Hidden in Assembly Code
However, the results produced are indeed the same:
Dead Loop Hidden in Assembly Code
To summarize why the problem did not recur:
Was it due to incorrect compilation options?
Was the 80×86 compiler more knowledgeable? Did it know how to compile the code properly?
Or are there unknown compilation features that we have not yet understood?
7.3 Other Architectures
If interested, you can verify if similar problems exist on other platforms; discussions are welcome.
8 Experience Summary
Please enhance the rigor of your code compilation. If using the gcc compiler, -Wall -Werror -Os is the minimum requirement;
Before discussing code optimization, please resolve your code compilation exceptions, achieving 0 error 0 warning first;
Please pay attention to the compilation warning: implicit declaration of function;
If you use the gcc compiler and it does not prompt any compilation warnings or errors, it does not mean the compiler hasn’t informed you; perhaps you compiled with the -w option, and you are merely deceiving yourself;
Honestly declare your functions before calling them or include their corresponding header files; sometimes the compiler’s default behavior may not be reliable;
Code details are important; indeed, details determine success or failure;
Do not overlook any possibility; as a developer, this spirit of research should always be in mind;
Dare to hypothesize, be cautious in verification; this is an unchanging methodology.
Author: recan
Original link:
https://club.rt-thread.org/ask/article/4e41fb1de844e925.html
Author’s message:

As the author of this article, let me briefly discuss a few points with friends in the comments section.

1. Regarding the recommendation of 0 warning 0 error, I strongly support it. However, considering the overall acceptance of the team and the trade-off between development and debugging efficiency, we have given up some compilation warnings, which is our mistake. Additionally, it was only after encountering this problem and resolving it that I realized that not declaring functions could lead to such a fatal error.

2. One cannot simply say that gcc is very low; as mentioned in point 1, this type of code without function declarations can indeed be configured in gcc to generate errors, interrupting your compilation process; this is not a unique privilege of IAR. On the contrary, I feel that gcc, as an open-source non-commercial compiler, retains sufficient openness and flexibility. After deeply understanding gcc, you can customize it extensively, which I find very excellent. Additionally, gcc has good support for cross-chip platforms, so when we consider a new chip platform, we prioritize whether this chip supports the gcc compiler.

The above views only represent my personal logic, and I welcome everyone to express their opinions. Thank you for your attention to this topic. Thank you.

The content is for learning reference only; the article represents the author’s personal views and does not represent the views of this platform. Copyright belongs to the original author. Please contact the author for reprints. If there are copyright issues, we will handle them promptly.
END

Leave a Comment