Understanding Multithreading and Multiprocessing in C++

Hello everyone, I am A Q.

These days I haven’t updated my interview experiences properly, and not many people are reading them, but that’s okay, I will continue to persevere as always.

Today I also received feedback from a friend saying that it was very helpful for them, which is the motivation for me to keep going for such a long time. As long as there is one person who says it is helpful, then my sharing is not in vain, and it is worth it for the summaries I produce every night.

However, I still want to thank the friends who provide interview experiences, as it is not only a review of their own interviews but also a help for everyone.

For more interview experiences, feel free to visit my homepage~

Source: https://www.nowcoder.com/discuss/433017686663135232

1. Self-introduction

2. Discuss your understanding of multithreading and multiprocessing used in your projects?

Here I will talk about multithreading and multiprocessing.

Multithreading:

Concept: Multithreading refers to multiple threads running within the same process, each thread has its own execution flow but shares the same address space and resources. Multithreading is lightweight, and the overhead of context switching is relatively small.
Implementation: In C++, multithreading is usually implemented using the header file <thread>, for example, std::thread. Communication between threads can use mechanisms such as mutexes and condition variables.
Applicable scenarios: Suitable for scenarios where tasks share data and require efficient data exchange and communication. Multithreading is suitable for concurrent computing within the same process, but care must be taken for thread safety and to avoid race conditions.

Multiprocessing:

Concept: Multiprocessing refers to multiple independent processes running simultaneously in the operating system, each process has its own independent address space and resources, and they do not share data with each other. Inter-process communication is relatively complex and usually requires the use of IPC (Inter-Process Communication) mechanisms.
Implementation: In C++, multiprocessing can be implemented by creating new processes, using functions like fork from the header file <cstdlib>. Inter-process communication can use mechanisms such as pipes, message queues, and shared memory.
Applicable scenarios: Suitable for scenarios that require independent computation between different processes, data isolation, and higher stability requirements. Multiprocessing is suitable for executing independent tasks in different processes, but care must be taken regarding the overhead and complexity of inter-process communication.

3. Communication methods between threads and processes?

Thread communication methods:

Mutex:

Concept: A mutex is a synchronization mechanism used to protect critical sections. Only the thread that successfully acquires the lock can access shared resources, while other threads must wait for the lock to be released.
Usage: Threads use the member functions std::mutex provided to lock and unlock.

Condition Variable:

Concept: Condition variables are used for notifications and waiting between threads. One thread can wait for a certain condition to occur using a condition variable, while another thread can signal the waiting thread using the condition variable.
Usage: In C++, condition variables are implemented using std::condition_variable and std::unique_lock.

Semaphore:

Concept: A semaphore is a counter used to control access to shared resources by multiple threads. Threads can synchronize by performing P (wait) and V (signal) operations on the semaphore.
Usage: C++11 introduced std::semaphore.

Barrier:

Concept: A barrier is used to synchronize the execution of multiple threads, blocking them until all threads reach the barrier point before continuing execution.
Usage: C++11 introduced std::barrier.

Thread-safe Queue:

Concept: A thread-safe queue is a data structure where one thread can safely insert data while another thread can safely retrieve data.
Usage: Can be implemented based on mutexes and condition variables.

Atomic Operations:

Concept: Atomic operations are indivisible operations that guarantee atomicity in a multithreaded environment. C++ provides std::atomic types and related atomic operation functions to implement thread-safe operations.

Message Queue:

Concept: A message queue is a method of inter-thread communication where one thread can send messages to the queue, and another thread can receive messages from the queue.
Usage: Implemented using a thread-safe queue.

Read-Write Lock:

Concept: A read-write lock allows multiple threads to read shared resources simultaneously, but requires exclusive access for writing. This can improve concurrency performance in certain scenarios.
Usage: In C++, read-write locks are implemented using std::shared_mutex.

Inter-process communication methods:

Pipe: The parent process creates a pipe and then creates a child process using fork. The parent and child processes can communicate through the pipe, which can be unidirectional (parent to child or child to parent) or bidirectional.
Named Pipe (FIFO): A special file used for inter-process communication. Multiple processes can communicate by opening the same FIFO file.
Message Queue: Processes can use message queues for communication by sending and receiving messages using system calls msgget, msgsnd, msgrcv.
Shared Memory: Multiple processes can map the same shared memory to communicate through reading and writing to shared memory. Synchronization mechanisms such as semaphores are required.
Signal: A process can communicate with other processes by sending signals. Signals are an asynchronous communication method, with common signals like SIGKILL, SIGTERM, etc.
Socket: In network programming, sockets are used for communication between processes on different hosts. Local sockets can also be used for communication between different processes on the same host.
File Lock: Processes can use file locks for access control, thereby communicating.

4. Difference between Named Pipe and Anonymous Pipe?

Named Pipe:

Naming: Named pipes have a name (path), through which communication can occur between different processes. The related named pipe files can be seen in the file system.
Inter-process communication: Named pipes can be used for communication between unrelated processes, which can be different applications.
Persistence: Named pipes are persistent, even if the process that created them terminates, the named pipe still exists in the file system until explicitly deleted.
Creation and deletion: Named pipes can be created and deleted using command-line tools or related system calls.

Anonymous Pipe:

Naming: Anonymous pipes have no name and can only communicate between related parent and child processes.
Inter-process communication: Anonymous pipes are mainly used for communication between related parent and child processes, typically created in one process and then passed to the child process via fork (Unix/Linux) or CreateProcess (Windows).
Temporality: Anonymous pipes are temporary and will be destroyed once the related processes end.
Creation and deletion: Anonymous pipes are typically created at runtime and do not require explicit creation and deletion operations.

Difference:

Persistence: Named pipes are persistent, while anonymous pipes are temporary.
Naming: Named pipes have names, while anonymous pipes do not.
Inter-process communication: Named pipes can be used for communication between unrelated processes, while anonymous pipes are mainly used for communication between related parent and child processes.

5. How to implement condition variables?

Usually used together with mutexes.

Mutex: Mutexes are used to protect the shared resources controlled by condition variables, ensuring that there are no race conditions during waiting and notifying with condition variables. Mutexes provide exclusive access to shared resources.
Condition Variable: Condition variables are used to pass signals between one or more threads, notifying other threads that an event has occurred. Condition variables are typically used with mutexes to ensure atomic operations between checking conditions and waiting.
Shared Condition Variable Condition: Condition variables need to be used with shared conditions to determine whether waiting threads should continue executing or keep waiting. This is usually composed of user-defined Boolean expressions or other conditions.

Code example:

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

std::mutex mtx;
std::condition_variable cv;
bool ready = false;

void worker_thread() {
    std::unique_lock<std::mutex> lock(mtx);
    
    // Wait for the condition to become true
    cv.wait(lock, [] { return ready; });

    // Operations executed when the condition is true
    std::cout << "Worker thread is processing..." << std::endl;
}

int main() {
    std::thread worker(worker_thread);

    // Do some work...

    // Set the condition to true and notify waiting threads
    {
        std::lock_guard<std::mutex> lock(mtx);
        ready = true;
    }
    cv.notify_one();

    worker.join();

    return 0;
}

6. How is the semaphore mechanism implemented? Can this mechanism be used for inter-process communication? Are the P and V operations atomic operations in the semaphore mechanism?

A semaphore is a mechanism used for synchronization and mutual exclusion between threads or processes, usually including two main operations: P (wait) and V (signal). In C++, the implementation of semaphores is usually based on mutexes and condition variables.

Basic implementation of semaphores:

#include <mutex>
#include <condition_variable>

class Semaphore {
public:
    explicit Semaphore(int count) : count_(count) {}

    void P() {
        std::unique_lock<std::mutex> lock(mutex_);
        while (count_ <= 0) {
            // Wait for resources to be available
            condition_.wait(lock);
        }
        --count_;
    }

    void V() {
        std::lock_guard<std::mutex> lock(mutex_);
        ++count_;
        // Notify waiting threads
        condition_.notify_one();
    }

private:
    std::mutex mutex_;
    std::condition_variable condition_;
    int count_;
};

In the example, the P operation is used to acquire resources; if resources are unavailable, the thread will wait; the V operation is used to release resources and notify waiting threads.

Are P and V operations atomic:

In general, P and V operations are not atomic because they involve checking and updating the value of the semaphore. To ensure atomicity, additional synchronization mechanisms, such as mutexes, are typically required. In the semaphore implementation above, the P and V operations use mutexes to ensure atomicity in a multithreaded environment.

7. How are atomic operations implemented in nuttx (project)? How are atomic operations implemented in Linux? Discuss the specific process.

8. Discuss the interrupt process in Linux and your understanding of interrupt context.

The basic process of interrupts in Linux:

Interrupt Trigger: An external device or other hardware generates an interrupt signal and sends an interrupt request to the CPU.
Interrupt Response: The CPU, upon receiving the interrupt request, pauses the currently executing task and transfers control to the kernel interrupt handler.
Saving Context: The kernel saves the context of the current process (register values, program counter, etc.).
Calling Interrupt Service Routine (ISR): The kernel looks up the corresponding ISR based on the interrupt number and begins executing it.
Handling Interrupt: The ISR is responsible for handling the interrupt, which may involve device drivers, data processing, etc.
End of Interrupt: After the ISR finishes executing, control returns to the previous process, restoring the previously saved context.

Interrupt Context:

Interrupt context refers to the code path executed by the kernel while handling an interrupt. Unlike user process context, interrupt context has the following characteristics:

No User Space Access: Cannot access user space memory because it is executed in kernel mode.
Non-blocking: Cannot be blocked because it must respond quickly to interrupt requests.
Cannot Sleep: Cannot sleep because sleeping may cause the system to fail to respond to interrupts.
Cannot Schedule Other Processes: Cannot schedule other processes because it is executed within the interrupt service routine.
Uses Kernel Stack: Uses the kernel stack instead of the user stack.

9. Let’s talk about polymorphism in C++

Virtual Functions:

A virtual function is declared as virtual in the base class, and derived classes can override it. When calling a virtual function using a base class pointer or reference, the actual implementation in the derived class is called.

Example code:

class Shape {
public:
    virtual void draw() const {
        // Base class virtual function implementation
    }
};

class Circle : public Shape {
public:
    void draw() const override {
        // Derived class implementation
    }
};

int main() {
    Shape* shape = new Circle();
    shape->draw();  // Calls derived class implementation
    delete shape;
    return 0;
}

Pure Virtual Functions and Abstract Classes:

If a base class has a pure virtual function, it becomes an abstract class and cannot be instantiated. Derived classes must implement the pure virtual function to become concrete classes.

Reference code:

class AbstractShape {
public:
    virtual void draw() const = 0;  // Pure virtual function makes the class abstract
};

class ConcreteCircle : public AbstractShape {
public:
    void draw() const override {
        // Implement abstract function
    }
};

int main() {
    AbstractShape* shape = new ConcreteCircle();
    shape->draw();
    delete shape;
    return 0;
}

Runtime Polymorphism and Dynamic Binding:

When calling a virtual function using a base class pointer or reference, the compiler cannot determine which class’s implementation to call; it is dynamically decided at runtime.

Reference code:

void printDrawing(const Shape& shape) {
    shape.draw();
}

int main() {
    Circle circle;
    printDrawing(circle);  // Dynamic binding, calls Circle's draw()
    return 0;
}

Virtual Destructors:

When using a base class pointer to manage derived class objects, if the base class has a virtual destructor, it ensures that the correct destructor is called when destroying the object.

class Base {
public:
    virtual ~Base() {
        // Virtual destructor
    }
};

class Derived : public Base {
public:
    ~Derived() override {
        // Derived class destructor
    }
};

int main() {
    Base* ptr = new Derived();
    delete ptr;  // Calls derived class destructor
    return 0;
}

10. What are the differences between pure virtual functions and virtual functions?

Virtual Functions:

Implementation: Virtual functions have an implementation in the base class but can be overridden in derived classes.
Optional Implementation: Virtual functions can include a default implementation in the base class, allowing derived classes to choose whether to override it.
Instantiable Object: The base class can contain instantiable virtual functions, allowing base class objects to be instantiated.
Using Keyword: Declared using the virtual keyword.

Example:

class Base {
public:
    virtual void foo() {
        // Base class virtual function implementation
    }
};

Pure Virtual Functions:

No Implementation: Pure virtual functions are declared in the base class but have no implementation and must be implemented in derived classes.
Mandatory Implementation: Derived classes must provide an implementation for pure virtual functions, or they will also become abstract classes.
Abstract Class: A class containing at least one pure virtual function is called an abstract class and cannot be instantiated.
Using Keyword: Declared using the virtual keyword and = 0 to specify pure virtual functions.

Example:

class AbstractBase {
public:
    virtual void pureVirtualFunction() = 0;
};

11. How are virtual functions implemented? Where is the virtual function table stored?

Virtual function implementation principles:

Virtual Function Declaration: When declaring a virtual function in the base class, use the virtual keyword. The declaration of a virtual function tells the compiler that this is a function that can be overridden by derived classes.

class Base {
public:
    virtual void foo() {
        // Virtual function implementation
    }
};

Virtual Function Table: For classes containing virtual functions, the compiler generates a virtual function table. This table is an array of pointers, with each element pointing to the address of a virtual function. The virtual function table is created at compile time, and each class has only one virtual function table, which stores the addresses of all virtual functions belonging to that class.
Virtual Function Pointer: The memory layout of the object includes a pointer to the virtual function table, typically called a virtual function pointer or vptr. This pointer stores the starting address of the object’s virtual function table.
Dynamic Binding: At runtime, when a virtual function is called through a base class pointer or reference, it actually finds the virtual function table using the virtual function pointer, then looks up and calls the correct virtual function. This is the implementation of dynamic binding.

Storage location of the virtual function table:

The virtual function table is stored at a specific location in memory, determined by the compiler and operating system implementation. Generally, the pointer to the virtual function table is stored at the front of the object’s memory layout. In single inheritance, the virtual function table is usually at the beginning of the object. For multiple inheritance, there may be multiple virtual function tables, each corresponding to a base class.

12. How to access private variables in a class? Is friendship one-way or two-way? Can friendship be inherited?

Through Public Member Functions: A class can provide public member functions to access private variables.

Example code:

class MyClass {
private:
    int privateVar;

public:
    void setPrivateVar(int value) {
        privateVar = value;
    }

    int getPrivateVar() const {
        return privateVar;
    }
};

Through Friend Functions: Friend functions can access the private members of a class. A friend function is declared and defined outside the class, and it is declared as a friend within the class.

class MyClass {
private:
    int privateVar;

public:
    friend void friendFunction(MyClass& obj);

    // other members...
};

void friendFunction(MyClass& obj) {
    obj.privateVar = 42;  // Friend function can access private variable
}

Through Friend Classes: Friend classes can access the private members of a class.

class FriendClass {
public:
    void modifyPrivateVar(MyClass& obj) {
        obj.privateVar = 42;  // Friend class can access private variable
    }
};

class MyClass {
private:
    int privateVar;

    friend class FriendClass;  // FriendClass is a friend of MyClass

    // other members...
};

Regarding the nature of friendship:

One-way: Friendship is one-way. If class A is a friend of class B, it does not mean that class B automatically becomes a friend of class A.
Non-inheritable: Friendship is not inherited. If class A is a friend of class B, and class C is a derived class of class B, it does not mean that class C is also a friend of class A.

13. How do you understand references and pointers? Why do we need references when we have pointers?

References:

Simpler Syntax: A reference is an alias defined using the & symbol. Compared to pointers, references have simpler syntax.

int x = 10;
int& ref = x;  // Reference

Cannot be Reassigned: Once a reference is initialized, it will always refer to the same object and cannot be reassigned to refer to another object.
No Need for Dereference Operator: When using references, there is no need to use the dereference operator *.

Cannot be Null: References must be initialized when declared, and there is no concept of null references.

Pointers:

More Complex Syntax: Pointers are declared using the * symbol, and dereference operator * is needed to access the value it points to.

int x = 10;
int* ptr = &x;  // Pointer

Can be Reassigned: Pointers can be reassigned to point to other objects.
Need for Dereference Operator: Accessing the value pointed to by a pointer requires using the dereference operator *.

Can be Null: Pointers can have a null value (null), meaning they point to a null address.

Why do we need references?

Simpler Syntax: References provide a more concise syntax, especially when passing function parameters or declaring references.

Avoid Null Pointer Issues: References cannot be null, thus reducing the occurrence of null pointer issues in certain cases.

Used for Function Parameter Passing: In function parameter passing, references can avoid the overhead of copying objects while allowing modification of passed parameters within the function.

Used for Operator Overloading: References are often used in operator overloading, making the code cleaner and more readable.

14. What new features of C++ do you know? Discuss your understanding of lambda expressions, and your understanding of smart pointers. What are the disadvantages of smart pointers?

New features of C++11:

Automatic Type Deduction (Auto): Allows the compiler to deduce the type of variables, making the code more concise.

auto x = 5; // x's type will be deduced as int

Range-based for loop: Simplifies the traversal of container elements.

std::vector<int> numbers = {1, 2, 3, 4, 5};
for (const auto& num : numbers) {
    // Use num
}

Smart Pointers: Introduced std::shared_ptr and std::unique_ptr, which are used to manage dynamically allocated memory and help prevent memory leaks.

std::shared_ptr<int> sharedPtr = std::make_shared<int>(42);

Lambda Expressions: Allow defining anonymous functions within functions, improving code readability and flexibility.

auto add = [](int a, int b) { return a + b; };

nullptr: Introduced the null pointer constant nullptr, replacing the traditional null pointer NULL.

int* ptr = nullptr;

Type Casting: Introduced safer and more flexible type casting operators like static_cast, dynamic_cast, const_cast, and reinterpret_cast.

double x = 3.14;
int y = static_cast<int>(x);

Right Value References and Move Semantics: Supports efficient handling of temporary objects through right value references.

std::vector<int> getVector() {
    // Return a temporary vector
    return std::vector<int>{1, 2, 3};
}

std::vector<int> numbers = getVector(); // Using move semantics

New Containers and Algorithms: Introduced new containers like std::unordered_map and std::unordered_set, as well as some new algorithms.

std::unordered_map<int, std::string> myMap = {{1, "one"}, {2, "two"}};

Thread Support (std::thread): Provides native multi-threading support, making concurrent programming more convenient.

#include <thread>

void myFunction() {
    // Code executed by the thread
}

int main() {
    std::thread t(myFunction);
    t.join(); // Wait for thread to finish
    return 0;
}

C++14:

Generic Lambda Expressions
std::make_unique (for creating std::unique_ptr)

C++17:

Structured Bindings
std::optional type
std::variant type
std::any type
Parallel Algorithms

C++20:

Concepts
Coroutines
Initialization Statements in if and switch
Three-way Comparison Operator <=>
Ranges Library

Lambda Expressions:

Lambda expressions are a feature introduced in C++11 that allow defining anonymous functions in code. The basic syntax of a lambda expression is as follows:

[ capture_clause ] ( parameter_list ) -> return_type {
    // Function body
}

Where:

capture_clause specifies how to capture external variables, which can be value capture, reference capture, or mixed capture.
parameter_list is the function parameter list.
return_type is the return type.
{} contains the function body of the lambda expression.

The main advantages of lambda expressions are their simplicity and convenience, especially in scenarios where functions need to be passed as parameters, such as in STL algorithms.

Smart Pointers:

Smart pointers are tools provided by C++ for managing dynamically allocated memory. Common smart pointers include std::shared_ptr and std::unique_ptr.

std::shared_ptr:

Principle:std::shared_ptr uses reference counting to manage memory. Each time you point a std::shared_ptr to an object, the reference count increases. When the reference count reaches zero (no shared_ptr pointing to the object), the memory of the object will be automatically released.
When to Release Memory: The memory of the object is released when the last std::shared_ptr pointing to it is destroyed or reset.

std::unique_ptr:

Principle:std::unique_ptr uses exclusive ownership semantics, ensuring that only one pointer can own the resource. It does not use reference counting; the resource is automatically released when std::unique_ptr is destroyed or goes out of scope or is transferred using std::move.
When to Release Memory: The resource is released when std::unique_ptr is destroyed, goes out of scope, or transfers ownership.

std::weak_ptr:

Principle:std::weak_ptr is a weak reference smart pointer that is used to help solve circular reference problems. It does not increase the reference count and is only used to monitor the status of a std::shared_ptr.
When to Release Memory:std::weak_ptr does not directly release memory; it is only used to observe the lifecycle of std::shared_ptr. Memory release is handled by the relevant std::shared_ptr.

std::auto_ptr (deprecated, not recommended):

Principle:std::auto_ptr has exclusive ownership, similar to std::unique_ptr. However, it has some flaws, does not support copy operations, and lacks move semantics.
When to Release Memory: When std::auto_ptr is destroyed, it releases its memory. However, due to its flaws, it can lead to unsafe memory management.

Disadvantages of Smart Pointers:

Extra Overhead: Smart pointers require maintaining extra information, such as reference counts, which may incur additional memory and performance overhead.
Circular Reference Problem: When using std::shared_ptr, circular references can occur, causing objects to not be released correctly, leading to memory leaks.
Not Suitable for All Scenarios: Smart pointers are not a one-size-fits-all solution. In some specific cases, raw pointers may be more appropriate.

15. What do you know about the C++ compilation process and memory allocation after compilation (executable program)? Do you know about the bss segment?

Preprocessing: In the first stage of compilation, the source code is processed by the preprocessor. The preprocessor performs the following actions:

Removing Comments: Deletes comment lines from the source code (in the form of // or /* */).
Macro Expansion: Expands macros defined in the source code to their actual content.
Processing Conditional Compilation: Determines whether to include or exclude specific code blocks based on conditional compilation directives (such as #ifdef, #ifndef, #if, etc.).

Compilation: In the compilation stage, the preprocessed source code is translated into intermediate code (usually a representation of assembly language or machine code), which is commonly referred to as assembly code.

Assembly: The assembler translates the intermediate code into assembly language code for the target machine. This is a step related to the specific computer architecture.

Linking: In the linking stage, the compiler links external libraries (such as standard libraries, custom libraries) used in the source code with the program’s assembly code to create an executable file. The linker performs the following actions:

Resolving Symbol References: Links function or variable references in the source code with their definitions.
Resolving Library Dependencies: Determines the external libraries needed by the program and links them into the program.
Generating Executable File: Combines all assembly code and libraries into an executable file.

Optimization (Optional): Some compilers perform code optimization during the compilation stage to improve program performance and efficiency. Optimizations can include constant folding, loop unrolling, inline functions, etc.

Generating Executable File: Finally, the compiler generates an executable file from the linked code. This file contains instructions that the computer can execute directly.

Executing Program: Users can run the generated executable file, loading the program into the computer’s memory and starting execution.

Memory allocation of executable programs:

Code Segment (Text Segment):

Stores the machine code of the program, which is usually read-only.
Contains the instructions of the executable program.

Data Segment:

Stores global and static variables.
Divided into initialized data segment (.data) and uninitialized data segment (.bss).

Heap:

Dynamic memory space managed by the programmer.
Allocated and released using new and delete or malloc and free.

Stack:

Stores local variables, function parameters, and function call information.
Automatically managed by the system.

bss segment:

Uninitialized Data Segment (BSS Segment):

Contains uninitialized global and static variables in the program.
When the executable program is loaded, the kernel allocates memory for these variables and initializes them to zero (or other default values).
bss stands for Block Started by Symbol.

16. What is the suffix of an executable program? What is its starting memory address? What is the range of space?

The suffix of an executable program usually has no fixed specification; different operating systems and compilation environments may adopt different naming conventions. In most cases, the suffix of an executable program can be any valid file name suffix, such as .out, .exe, or no suffix at all.

Regarding the starting address and range of the executable program in memory, this depends on the specific implementation of the operating system and loader. Different operating systems and execution environments may have different loading addresses and address space ranges.

In typical cases:

Starting Memory Address: When the executable program is loaded into memory, the operating system usually allocates a starting address for it, known as the loading address. This address is typically dynamically chosen by the operating system based on system and application needs.
Space Range: The range of address space occupied by the executable program in memory depends on the size of the program and the memory layout of the operating system. In a 32-bit system, the typical user space address range is from 0x08048000 to 0xffffffff (Linux system), while in a 64-bit system, the user space address range is much broader.

17. Does the nuttx system project you worked on have the concept of virtual memory? How does Linux map virtual addresses to physical addresses? Discuss in detail (paging, segmentation, and segmented paging, mentioning the implementation algorithms and differences between them).

In the Linux system, the mapping of virtual addresses to physical addresses is achieved through page tables.

Paging:

Paging is a memory management mechanism where both virtual memory and physical memory are divided into fixed-size pages. The operating system uses page tables to map virtual addresses to physical addresses. The basic steps are as follows:

Page Division: Divides the virtual address space and physical address space into fixed-size pages (usually 4KB).
Establishing Page Tables: The operating system maintains a page table that maps each virtual page to a physical page.
Address Translation: When a program references a virtual address, the corresponding physical address is found through the page table.
Page Table Entry: Each entry in the page table is called a page table entry (PTE), containing the mapping relationship from virtual page number to physical page number.

Segmentation:

Segmentation is another memory management mechanism where virtual addresses and physical addresses are divided into different segments. Each segment has different permissions and attributes. The basic steps are as follows:

Segment Division: Divides the virtual address space and physical address space into different segments, such as code segment, data segment, etc.
Establishing Segment Tables: The operating system maintains a segment table that maps each segment to physical addresses.
Address Translation: When a program references a virtual address, the corresponding physical address is found through the segment table.
Segment Table Entry: Each entry in the segment table is called a segment descriptor, containing information such as the segment’s base address, size, and permissions.

Segmented Paging Memory Management:

In actual systems, paging and segmentation can be combined to form segmented paging memory management. In this way, the virtual address is first translated through the segment table to obtain the segment’s base address, and then the page table is used for the offset within the page. The basic steps are as follows:

Segment and Page Division: Divides virtual addresses and physical addresses into segments and pages.
Establishing Segment Tables and Page Tables: The operating system maintains segment tables and page tables to map virtual addresses to physical addresses.
Address Translation: Virtual addresses are translated through the segment table to obtain the segment’s base address, then through the page table to obtain the physical address of the page.
Segment Table Entry and Page Table Entry: Each entry in the segment table is a segment descriptor containing information such as the segment’s base address. Each entry in the page table is a page table entry containing the mapping relationship from virtual page number to physical page number.

Differences and Choices:

Advantages of Paging:

Pages are of fixed size, simplifying memory management.
Suitable for sparse address spaces, without needing to allocate physical space for each segment.

Advantages of Segmentation:

Suitable for irregular address spaces, where each segment can have different permissions and attributes.
Easier to support shared and dynamically growing data structures.

Choice of Segmented Paging Memory Management:

Combines the advantages of paging and segmentation to adapt to different memory management needs.
Most modern operating systems adopt segmented paging memory management, such as the x86-64 architecture operating systems.

18. Have you optimized any code you have written? What optimization methods did you use? (I answered one about using shifts to optimize multiplication and division, then the interviewer asked how to handle floating-point operations, and I was stumped by this question)

19. Have you used GDB for debugging? Talk about common commands. If I have a compilation error, how can I locate the error position? Do you know how to debug multithreading in GDB?

Common GDB Commands:

Starting GDB:
```
gdb executable_name
```
Running Program:
```
run [args]
```

Setting Breakpoints:

break function_name
break filename:line_number

Viewing Source Code:
```
list [filename:line_number]
```
Stepping Through:
```
step
```
Next Step Execution:
```
next
```
Continue Execution:
```
continue
```
Viewing Variable Values:
```
print variable_name
```
Viewing Stack Frames:
```
backtrace
```
Viewing Registers:
```
info registers
```

Locating Error Position:

If there is a compilation error, GDB can be used to locate the error position. First, debugging information needs to be added during compilation for GDB to perform symbol table lookup.

gcc -g -o executable_name source_code.c

Then, start GDB and run the program:

gdb ./executable_name

In GDB, use the run command to run the program. If the program crashes, GDB will stop when the error occurs.

Multithreading Debugging:

When debugging multithreading in GDB, the following commands can be used:

Setting Breakpoints:
```
break function_name
```
Viewing Thread List:
```
info threads
```
Selecting Thread:
```
thread thread_id
```
Switching to Next Thread:
```
thread next
```
Switching to Specified Thread:
```
thread apply thread_id_list command
```
Viewing Current Thread’s Stack Frames:
```
backtrace
```
Viewing Specified Thread’s Stack Frames:
```
thread apply thread_id backtrace
```

20. Do you know the concept of backtrace? Talk about stack backtrace and how to debug using gdb.

Backtrace refers to printing call stack information during program execution, showing function call relationships and corresponding line numbers. Backtrace can help developers locate the position of program crashes or errors. In C/C++ programs, stack backtrace can be used to obtain information about the function call chain.

In GDB, stack backtrace is a common debugging method used to view the function call stack when a program crashes. The basic steps to perform stack backtrace in GDB are as follows:

Start GDB:
```
gdb ./executable_name
```
Run Program:
```
run [args]
```
When the program crashes, view the stack backtrace:
```
backtrace
```

Or use the shorthand command bt:

bt

Viewing Information of Specific Stack Frame:
```
frame frame_number
```

Where, frame_number is the number of the stack frame, which can be viewed using the backtrace command.

Example of using stack backtrace:

Suppose there is the following simple C++ program:

#include <iostream>

void func3() {
    throw std::runtime_error("Exception in func3");
}

void func2() {
    func3();
}

void func1() {
    func2();
}

int main() {
    try {
        func1();
    } catch (const std::exception& e) {
        std::cerr << "Exception caught: " << e.what() << std::endl;
    }

    return 0;
}

Using GDB for debugging:

Add debugging information when compiling:
```
g++ -g -o my_program my_program.cpp
```
Start GDB and run the program:
```
gdb ./my_program
```
In GDB, run the program, and when it crashes, use backtrace to view the stack backtrace:
```
run
```
After the exception is thrown, you can use backtrace to view the backtrace information.
```
backtrace
```
Or use the shorthand command bt:
```
bt
```

21. Questions

That’s all for today. If there are more interview experiences to share, feel free to reach out to me. My homepage also has a QR code. Or you can add me on my work WeChat: aqzz0123

If this is helpful to everyone, remember to bookmark it and give a free three-way support~