Classic! Must-Know Stack Concepts in Embedded Programming

Word Count: 9000 Content Quality: ⭐⭐⭐⭐⭐

1. Prerequisite Knowledge—Memory Allocation in Programs

A program compiled by C/C++ occupies memory divided into the following parts: 1. Stack Area (stack)—automatically allocated and released by the compiler, used for storing function parameters and local variable values. Its operation is similar to a stack in data structures. 2. Heap Area (heap)—generally allocated and released by the programmer; if the programmer does not release it, it may be reclaimed by the OS when the program ends. Note that this is different from the heap in data structures; its allocation method is similar to that of linked lists. 3. Global Area (Static Area)—global and static variables are stored together; initialized global and static variables are in one area, while uninitialized global and static variables are in an adjacent area. The system releases them after the program ends. 4. Constant String Area—constant strings are stored here. The system releases them after the program ends. 5. Code Area—stores the binary code of function bodies.

2. Example Program

This is a very detailed example written by a predecessor.

//main.cpp
int a = 0; //Global Initialization Area
int a = 0; //Global Initialization Area
char *p1; //Global Uninitialized Area
main() {
    int b; //Stack
    char s[] = "abc"; //Stack
    char *p2; //Stack
    char *p3 = "123456"; //123456\0 is in Constant Area, p3 is on the stack.
    static int c = 0; //Global (Static) Initialization Area
    p1 = (char *)malloc(10);
    p2 = (char *)malloc(20);
    //The 10 and 20 bytes allocated are in the heap area.
    strcpy(p1, "123456"); //123456\0 is in Constant Area; the compiler may optimize it to the same location as p3 points to "123456".
}

3. Theoretical Knowledge of Heap and Stack

3.1 Allocation Method

Stack: Automatically allocated by the system. For example, declaring a local variable in a function like int b; the system automatically allocates space for b on the stack. Heap: Requires the programmer to allocate and specify the size. In C, the malloc function is used like p1 = (char *)malloc(10);; in C++, the new operator is used like p2 = (char *)malloc(10);. However, note that p1 and p2 themselves are on the stack.

3.2 System Response After Allocation

Stack: As long as the remaining space in the stack is greater than the requested space, the system will provide memory to the program; otherwise, it will report an exception indicating stack overflow. Heap: First, it should be known that the operating system maintains a linked list of free memory addresses. When the system receives a program’s request, it traverses this list to find the first heap node with space greater than the requested size, removes this node from the free list, and allocates this space to the program. Additionally, for most systems, the size of the allocated space is recorded at the beginning of this memory space, so that the delete statement in the code can correctly release this memory space. Moreover, since the size of the found heap node may not exactly equal the requested size, the system will automatically place the excess part back into the free list.

3.3 Size Limitations for Allocation

Stack: In Windows, the stack is a data structure that expands towards lower addresses and is a contiguous memory area. This means that the address of the stack top and the maximum capacity of the stack are predetermined by the system. In Windows, the stack size is 2M (some say 1M; in any case, it’s a constant determined at compile time). If the requested space exceeds the remaining space in the stack, it will prompt overflow. Therefore, the space obtained from the stack is relatively small. Heap: The heap is a data structure that expands towards higher addresses and is a non-contiguous memory area. This is because the system uses a linked list to store free memory addresses, which naturally is non-contiguous, and the direction of linked list traversal is from low addresses to high addresses. The size of the heap is limited by the effective virtual memory in the computer system. Hence, the space obtained from the heap is more flexible and larger.

3.4 Comparison of Allocation Efficiency

The stack is automatically allocated by the system, making it faster. However, the programmer cannot control it. The heap is memory allocated by new, which is generally slower and prone to memory fragmentation, but it is the most convenient to use. Additionally, in Windows, the best way is to use VirtualAlloc to allocate memory; this is neither in the heap nor the stack, but directly reserves a block of memory in the process’s address space, though it is the least convenient to use. However, it is fast and flexible.

3.5 Contents Stored in Heap and Stack

Stack: During a function call, the first thing pushed onto the stack is the address of the next executable statement after the function call statement; then the function’s parameters are pushed onto the stack. In most C compilers, parameters are pushed onto the stack from right to left, followed by local variables in the function. Note that static variables are not pushed onto the stack. When the current function call ends, local variables are popped from the stack first, followed by parameters, and finally, the stack pointer points to the address where it was initially stored, which is the next executable statement in the main function, and the program continues running from that point. Heap: Generally, a byte is stored at the head of the heap to record the size of the heap. The specific contents in the heap are arranged by the programmer.

3.6 Comparison of Access Efficiency

char s1[] = "aaaaaaaaaaaaaaa";char *s2 = "bbbbbbbbbbbbbbbbb"; aaaaaaaaaaa is assigned at runtime; while bbbbbbbbbbb is determined at compile time; however, in future accesses, arrays on the stack are faster than pointers to strings (e.g., heap). For example:

#include
void main() {
    char a = 1;
    char c[] = "1234567890";
    char *p = "1234567890";
    a = c[1];
    a = p[1];
    return;
}

Corresponding assembly code:

10: a = c[1];
00401067 8A 4D F1 mov cl,byte ptr [ebp-0Fh]
0040106A 88 4D FC mov byte ptr [ebp-4],cl
11: a = p[1];
0040106D 8B 55 EC mov edx,dword ptr [ebp-14h]
00401070 8A 42 01 mov al,byte ptr [edx+1]
00401073 88 45 FC mov byte ptr [ebp-4],al

The first method directly reads the element from the string into the register cl, while the second method first reads the pointer value into edx and then reads the character based on edx, which is evidently slower.

3.7 Summary:

The difference between heap and stack can be illustrated as follows: Using the stack is like dining at a restaurant, where you just order (make a request), pay, and eat (use), and when you are full, you leave without worrying about prep work like washing vegetables or cleaning dishes afterward. Its advantage is speed, but with limited freedom. Using the heap is like cooking your favorite dishes yourself; it is more troublesome but more tailored to your taste and offers greater freedom.

4. Memory Structure in Windows Processes

Before reading this article, if you don’t even know what a stack is, please read the foundational knowledge at the end of the article first.

Anyone who has been exposed to programming knows that high-level languages can access data in memory through variable names. So how are these variables stored in memory? How does the program use these variables? The following will delve into this topic. The C language code below, unless otherwise stated, defaults to the release version compiled by VC.

First, let’s understand how C language variables are allocated in memory. C language has global variables (Global), local variables (Local), static variables (Static), and register variables (Register). Each type of variable has different allocation methods. Let’s look at the following code:

#include <stdio.h>
int g1=0, g2=0, g3=0;
int main() {
    static int s1=0, s2=0, s3=0;
    int v1=0, v2=0, v3=0;
    // Print the memory addresses of each variable
    printf("0x%08x\n",&v1); // Print local variable memory addresses
    printf("0x%08x\n",&v2);
    printf("0x%08x\n\n",&v3);
    printf("0x%08x\n",&g1); // Print global variable memory addresses
    printf("0x%08x\n",&g2);
    printf("0x%08x\n\n",&g3);
    printf("0x%08x\n",&s1); // Print static variable memory addresses
    printf("0x%08x\n",&s2);
    printf("0x%08x\n\n",&s3);
    return 0;
}

The execution result after compilation is:

0x0012ff78
0x0012ff7c
0x0012ff80

0x004068d0
0x004068d4
0x004068d8

0x004068dc
0x004068e0
0x004068e4

The output results are the memory addresses of the variables. Among them, v1, v2, and v3 are local variables, g1, g2, and g3 are global variables, and s1, s2, and s3 are static variables. You can see that these variables are continuously distributed in memory, but the memory addresses allocated for local and global variables differ significantly, while the memory for global and static variables is contiguous. This is due to the fact that local variables and global/static variables are allocated in different types of memory areas. For a process’s memory space, it can logically be divided into three parts: code area, static data area, and dynamic data area. The dynamic data area is generally referred to as the “heap stack”. The “stack” and “heap” are two different dynamic data areas; the stack is a linear structure, while the heap is a chain structure. Each thread of a process has its own private “stack”; thus, even though the code is the same, the data of local variables are independent of each other. A stack can be described by its “base address” and “stack top” address. Global and static variables are allocated in the static data area, while local variables are allocated in the dynamic data area, i.e., the heap stack. Programs access local variables through the base address and offset of the stack.

├———————┤Low-end memory area
│ …… │
├———————┤
│ Dynamic Data Area │
├———————┤
│ …… │
├———————┤
│ Code Area │
├———————┤
│ Static Data Area │
├———————┤
│ …… │
├———————┤High-end memory area

The stack is a last-in-first-out data structure, with the top address always less than or equal to the base address of the stack. We can first understand the process of function calls to gain a deeper understanding of the role of the stack in programs. Different languages have different function calling conventions, which involve rules for pushing parameters and stack balancing. The calling conventions of Windows API and ANSI C differ; the former adjusts the stack by the called function, while the latter does so by the caller. The two are distinguished by the prefixes “__stdcall” and “__cdecl”. Let’s look at the following code:

#include <stdio.h>
void __stdcall func(int param1,int param2,int param3) {
    int var1=param1;
    int var2=param2;
    int var3=param3;
    printf("0x%08x\n",param1); // Print memory addresses of each variable
    printf("0x%08x\n",param2);
    printf("0x%08x\n\n",param3);
    printf("0x%08x\n",&var1);
    printf("0x%08x\n",&var2);
    printf("0x%08x\n\n",&var3);
    return;
}

int main() {
    func(1,2,3);
    return 0;
}

The execution result after compilation is:

0x0012ff78
0x0012ff7c
0x0012ff80

0x0012ff68
0x0012ff6c
0x0012ff70

├———————┤<—Stack Top During Function Execution (ESP), Low-end Memory Area
│ …… │
├———————┤
│ var 1 │
├———————┤
│ var 2 │
├———————┤
│ var 3 │
├———————┤
│ RET │
├———————┤<—Stack Top After "__cdecl" Function Returns (ESP)
│ parameter 1 │
├———————┤
│ parameter 2 │
├———————┤
│ parameter 3 │
├———————┤<—Stack Top After "__stdcall" Function Returns (ESP)
│ …… │
├———————┤<—Stack Bottom (Base Address EBP), High-end Memory Area

The above diagram illustrates the state of the stack during a function call. First, the three parameters are pushed onto the stack in right-to-left order, starting with “param3”, followed by “param2”, and finally “param1”; then the return address (RET) is pushed, followed by a jump to the function address to execute. (It should be noted that many articles introducing buffer overflow principles in UNIX mention that after pushing RET, the current EBP is also pushed, and the current ESP replaces EBP. However, an article on function calls in Windows mentions that this step also occurs in Windows function calls, but my actual debugging has not revealed this step; this can also be inferred from the 4-byte gap between param3 and var1.) The third step involves decreasing the stack top (ESP) by a certain number to allocate memory for local variables; in the above example, this is a reduction of 12 bytes (ESP=ESP-3*4, as each int variable occupies 4 bytes); then the memory space for local variables is initialized. Since the “__stdcall” call is adjusted by the called function, before returning from the function, the stack must be restored by first reclaiming the memory occupied by local variables (ESP=ESP+3*4), then retrieving the return address to fill the EIP register, and finally reclaiming the memory occupied by parameters (ESP=ESP+3*4) to continue executing the caller’s code. Refer to the assembly code below:

;--------------Assembly Code of func Function-------------------
:00401000 83EC0C sub esp, 0000000C // Allocate Memory for Local Variables
:00401003 8B442410 mov eax, dword ptr [esp+10]
:00401007 8B4C2414 mov ecx, dword ptr [esp+14]
:0040100B 8B542418 mov edx, dword ptr [esp+18]
:0040100F 89442400 mov dword ptr [esp], eax
:00401013 8D442410 lea eax, dword ptr [esp+10]
:00401017 894C2404 mov dword ptr [esp+04], ecx

……………………（省略若干代码）
:00401075 83C43C add esp, 0000003C ; Restore Stack, Reclaim Memory Occupied by Local Variables
:00401078 C3 ret 000C ; Function Return, Restore Memory Occupied by Parameters
; If it were "__cdecl", it would be "ret", and the stack would be restored by the caller
;-------------------Function End-------------------------
;--------------Code for Calling func Function--------------
:00401080 6A03 push 00000003 // Push Parameter param3
:00401082 6A02 push 00000002 // Push Parameter param2
:00401084 6A01 push 00000001 // Push Parameter param1
:00401086 E875FFFFFF call 00401000 // Call func Function
; If it were "__cdecl", the stack would be restored here, "add esp, 0000000C"

Smart readers here probably understand the principle of buffer overflow. Let’s look at the following code:

#include <stdio.h>
#include <string.h>

void __stdcall func() {
    char lpBuff[8]="\0";
    strcat(lpBuff,"AAAAAAAAAAA");
    return;
}

int main() {
    func();
    return 0;
}

What happens when you compile and execute this? Ha, the instruction referencing “0x00414141” points to “0x00000000” memory. This memory cannot be “read”. An “illegal operation” occurs! “41” is the hexadecimal ASCII code for “A”, which clearly indicates that the issue lies with the strcat statement. The size of “lpBuff” is only 8 bytes, counting the terminating \0, meaning strcat can only write 7 “A” characters, but the program actually writes 11 “A” characters plus 1 \0. Looking at the diagram above, the extra 4 bytes precisely overwrite the memory space where RET is located, causing the function to return to an incorrect memory address and execute erroneous instructions. If this string can be constructed carefully, dividing it into three parts: the first part being meaningless filler data to achieve overflow, followed by data that overwrites RET, and finally a segment of shellcode, then as long as the RET address can point to the first instruction of this shellcode, the function will execute the shellcode upon return. However, constructing this RET is extremely difficult due to the different versions of software and runtime environments that can affect the position of this shellcode in memory. Generally, a large number of NOP instructions are filled between the RET and shellcode to increase the exploit’s universality.

├———————┤<—Low-end Memory Area
│ …… │
├———————┤<—Data Filled by Exploit Starts
│ │
│ buffer │<—Filled with Useless Data
│ │
├———————┤
│ RET │<—Points to Shellcode or Range of NOP Instructions
├———————┤
│ NOP │
│ …… <—Filled NOP Instructions, Range that RET Can Point To
│ NOP │
├———————┤
│ │
│ shellcode │
│ │
├———————┤<—Data Filled by Exploit Ends
│ …… │
├———————┤<—High-end Memory Area

In Windows, dynamic data can be stored not only in the stack but also in the heap. Friends familiar with C++ know that C++ can use the new keyword to dynamically allocate memory. Let’s look at the following C++ code:

#include <stdio.h>
#include <iostream.h>
#include <windows.h>

void func() {
    char *buffer=new char[128];
    char bufflocal[128];
    static char buffstatic[128];
    printf("0x%08x\n",buffer); // Print memory addresses of heap variables
    printf("0x%08x\n",bufflocal); // Print memory addresses of local variables
    printf("0x%08x\n",buffstatic); // Print memory addresses of static variables
}

void main() {
    func();
    return;
}

The execution result is:

0x004107d0
0x0012ff04
0x004068c0

It can be observed that memory allocated using the new keyword is neither in the stack nor in the static data area. The VC compiler implements the memory dynamic allocation of the new keyword through the “heap” in Windows. Before discussing the “heap”, let’s first understand several API functions related to the “heap”:

- HeapAlloc Allocates memory space in the heap
- HeapCreate Creates a new heap object
- HeapDestroy Destroys a heap object
- HeapFree Frees allocated memory
- HeapWalk Enumerates all memory blocks of a heap object
- GetProcessHeap Retrieves the default heap object of the process
- GetProcessHeaps Retrieves all heap objects of the process
- LocalAlloc
- GlobalAlloc

When a process initializes, the system automatically creates a default heap for the process, which occupies 1M of memory by default. Heap objects are managed by the system and exist in memory in a chain structure. The following code can dynamically allocate memory space through the heap:

HANDLE hHeap=GetProcessHeap();
char *buff=HeapAlloc(hHeap,0,8);

Here, hHeap is the handle of the heap object, and buff points to the address of the allocated memory space. So what exactly is hHeap? Does its value have any significance? Let’s look at the following code:

#pragma comment(linker,"/entry:main") // Define the entry point of the program
#include <windows.h>

_CRTIMP int (__cdecl *printf)(const char *, ...); // Define STL function printf
/*---------------------------------------------------------------------------
 Here we will review the knowledge discussed earlier:
 (*Note) The printf function is a standard library function in C language, and the VC standard library is implemented by the msvcrt.dll module.
 From the function definition, it can be seen that the number of parameters for printf is variable, and the function cannot know in advance how many parameters the caller has pushed. The function can only analyze the format of the first parameter string to obtain information about the pushed parameters. Since the number of parameters is dynamic, the caller must balance the stack, which is why the __cdecl calling convention is used. By the way, Windows API functions are generally in __stdcall calling form, with only one API exception, which is wsprintf, which uses __cdecl calling rules, just like the printf function, due to its variable number of parameters.
 ---------------------------------------------------------------------------*/
void main() {
    HANDLE hHeap=GetProcessHeap();
    char *buff=HeapAlloc(hHeap,0,0x10);
    char *buff2=HeapAlloc(hHeap,0,0x10);
    HMODULE hMsvcrt=LoadLibrary("msvcrt.dll");
    printf=(void *)GetProcAddress(hMsvcrt,"printf");
    printf("0x%08x\n",hHeap);
    printf("0x%08x\n",buff);
    printf("0x%08x\n\n",buff2);
}

The execution result is:

0x00130000
0x00133100
0x00133118

Why is the value of hHeap so close to that of buff? In fact, the hHeap handle points to the address of the HEAP header. In the user area of the process, there is a structure called PEB (Process Environment Block), which stores important information about the process. Among this, at offset 0x18 of the PEB base address is stored the address of the default process heap, while at offset 0x90 is stored a pointer to the list of addresses of all heaps in the process. Many APIs in Windows use the default heap to store dynamic data; for example, all ANSI version functions under Windows 2000 allocate memory in the default heap to convert ANSI strings to Unicode strings. Access to a heap is sequential; only one thread can access data in the heap at a time. When multiple threads have access requests simultaneously, they can only queue and wait, which may degrade the execution efficiency of the program.

Finally, let’s talk about data alignment in memory. Data alignment means that the memory address where the data resides must be an integer multiple of the length of that data. The starting address of DWORD data must be divisible by 4, and the starting address of WORD data must be divisible by 2. The x86 CPU can directly access aligned data; when it attempts to access unaligned data, it performs a series of adjustments internally, which are transparent to the program but can reduce running speed. Therefore, compilers try to ensure data alignment when compiling programs. Let’s look at the execution results of the same code compiled by three different compilers:

#include <stdio.h>

int main() {
    int a;
    char b;
    int c;
    printf("0x%08x\n",&a);
    printf("0x%08x\n",&b);
    printf("0x%08x\n",&c);
    return 0;
}

This is the execution result compiled with VC:

0x0012ff7c
0x0012ff7b
0x0012ff80

The order of variables in memory: b(1 byte)-a(4 bytes)-c(4 bytes).

This is the execution result compiled with Dev-C++:

0x0022ff7c
0x0022ff7b
0x0022ff74

The order of variables in memory: c(4 bytes)-3 bytes in between-b(1 byte)-a(4 bytes).

This is the execution result compiled with lcc:

0x0012ff6c
0x0012ff6b
0x0012ff64

The order of variables in memory: same as above.

All three compilers achieve data alignment, but the latter two compilers are clearly less efficient than VC, wasting memory by making a char occupy 4 bytes.

Basic Knowledge:
The stack is a simple data structure that only allows insertion or deletion at one end of a linear list. The end that allows insertion and deletion is called the stack top, while the other end is called the stack bottom. A set of CPU instructions can implement stack access in the process's memory. Among them, the POP instruction implements the out-stack operation, and the PUSH instruction implements the in-stack operation. The CPU's ESP register stores the current thread's stack top pointer, while the EBP register holds the current thread's stack bottom pointer. The CPU's EIP register stores the memory address of the next CPU instruction; when the CPU finishes executing the current instruction, it reads the memory address of the next instruction from the EIP register and continues execution.

Disclaimer: The materials for this article are sourced from the internet, and the copyright belongs to the original author. If there are any copyright issues with the works, please contact me for removal.