Understanding Rust’s Ownership Mechanism

Understanding Rust's Ownership Mechanism

Click the blue text above to follow

Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

Introduction to Rust Ownership

When I first learned about the concept of ownership, I felt it was similar to scope; what is the difference? Initially, I did not fully understand ownership, but gradually I began to comprehend it. Ownership is indeed different from scope. As Rust’s most unique feature, ownership has a profound impact on the language. It is precisely because of ownership that Rust can ensure memory safety without a garbage collection (GC) mechanism. Therefore, understanding the concept of ownership and its implementation in Rust is very important.Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

What is Ownership?

Ownership is a set of rules that determines how Rust manages memory. Rust does not have a GC; it manages memory through ownership. Before delving deeper, it is necessary to understand the concepts of stack and heap.Understanding Rust's Ownership Mechanism

From a physical perspective, there is essentially no difference between stack and heap; both belong to memory. Have you ever seen a stack memory bar and a heap memory bar?

Both stack and heap are memory spaces that code can use during execution, but they differ in structure and organization. The stack stores values in the order they are pushed (added) and retrieves them in the reverse order (popped), which is known as last in, first out (LIFO). All data stored on the stack must have a known and fixed size; for data with an unknown size, it can only be stored on the heap.

Heap space management is more loose. When you want to place data on the heap, you can request a specific size of space. The operating system will find a sufficiently large space on the heap, mark it as used, and return a pointer to that space. This process is known as heap allocation. The pointer size is fixed and can be determined at compile time, so it can be stored on the stack.

Pushing data onto the stack is more efficient than allocating space on the heap. Why? Because the operating system does not have to search for a new storage location for the data; for the stack, this location is always at the top of the stack. For the heap, due to pointer jumps, accessing data on the heap is slower than accessing data on the stack. Additionally, due to caching, the more times instructions jump in memory, the worse the performance.

Many system programming languages require you to track the space allocated in your code, minimize the amount of redundant data on the heap, and clean up useless data on the heap in a timely manner to avoid exhausting space. All of this can be solved by ownership. Understanding how to use and manage memory can help understand the significance of ownership and the principles behind its operation.

Understanding Rust's Ownership Mechanism

Remember the following ownership rules:

  • Every value in Rust has a corresponding owner;

  • At any given time, a value can have only one owner;

  • When the owner goes out of scope, the value it holds will be dropped;

Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

Example

Let’s illustrate with practical examples.

First example, variable scope. Scope is the range in which an object is valid in the program.

fn main() {    let s = "hello";    println!("s = {s}");}

The owner of “hello” is s, which is a string literal, immutable, and its size can be determined at compile time, being hardcoded into the program. The advantage of this is efficiency. When the program runs, “hello” is placed on the stack, and when the s variable goes out of scope, it becomes invalid, and “hello” is dropped.

Second example, String type.

When the program wants to get user input, the size is uncertain. Rust provides the String type, which allocates storage space on the heap, allowing it to handle text of unknown size at compile time.

fn main() {    let s = String::from("hello");    println!("s = {s}");}

The owner of “hello” is s, and when the program runs, “hello” is placed on the heap. When the s variable goes out of scope, it becomes invalid, and “hello” is dropped.

Understanding Rust's Ownership Mechanism

The String::from function requests the space it needs, and when it goes out of scope, it automatically calls the drop function, where the author of the String type can write code to release memory. In the above example, this release mechanism seems simple, but once applied in more complex environments, the behavior of the code can often be unexpected, especially when multiple variables point to the same heap memory.Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

Moving

In Rust, multiple variables can interact with the same data in a unique way. For example:

fn main() {    let s1 = String::from("hello");    let s2 = s1;    println!("s1 = {s1}, s2 = {s2}");}

At compile time, an error will occur. If this were another language, the above would likely work, but in Rust, it does not. The error message:Understanding Rust's Ownership Mechanism

The first line in blue indicates that since s1 is of type String, this operation involves a move, but the String type does not implement the Copy trait. The second line in blue indicates that the value has been moved here, and the green line kindly offers a suggestion. From the explanation, it can be seen that when executing s2 = s1, the owner of “hello” moves from s1 to s2 (the term is moving, from s1 to s2). As mentioned in the previous rules, at any given time, a value can have only one owner, so s1 is now invalid, and trying to output s1’s value in the println! macro results in an error.

String consists of 3 parts:

  • A pointer ptr that points to the stored string content;

  • Length len;

  • Capacity capacity;

Understanding Rust's Ownership MechanismThese 3 parts are stored on the stack, while the actual string is stored on the heap. After executing s2 = s1, both s2 and s1 point to the same location of the “hello” string.Understanding Rust's Ownership Mechanism

Now the problem arises: when both s2 and s1 go out of scope, the drop function will be called twice, which will lead to double freeing of the same memory. Double freeing can corrupt some data that is still in use, leading to potential hazards. Therefore, to ensure safety, Rust treats s1 as an invalid variable after executing s2 = s1, so trying to output it in the println! macro results in an error (Rust performs strict checks at compile time), and Rust does not allow this compilation to pass.

To avoid the invalidation of the s1 variable, the clone method can be used. This way, in addition to copying the data on the stack, it will also copy the data on the heap. Example code:

fn main() {    let s1 = String::from("hello");    let s2 = s1.clone();    println!("s1 = {s1}, s2 = {s2}");}

Check the key points of the disassembly:Understanding Rust's Ownership MechanismFirst, call __rust_alloc to request a block of heap memory, 6F68656C6C -> hello moves to this heap memory.Understanding Rust's Ownership MechanismThen, push the length 5, the heap memory address (rax, address 0x1F7FB46EFA0), and the capacity 5 onto the stack. The current stack structure, where each member occupies 8 bytes.Understanding Rust's Ownership MechanismNext, the clone operation:Understanding Rust's Ownership MechanismInside clone, a new block of heap memory is allocated first, and then memcpy is called to perform the copy operation.Understanding Rust's Ownership MechanismFinally, the new block is returned, which is the s2 variable (0x2373D9F5C0).Understanding Rust's Ownership MechanismStack structure:Understanding Rust's Ownership MechanismThis is the internal principle of the String type data clone. It can be seen that no move has occurred; both s1 and s2 are valid. Now let’s look at another example:

fn main() {    let x = 5;    let y = x;    println!("x = {x}, y = {y}");}

In the above code, there will be no compile-time error; there is no clone call, but after x is assigned to y, it is still valid, and no move occurs. Why? Because 5 is an integer, and integer data can have its size determined at compile time, and it can store its data completely on the stack. For these values, copy operations are always very fast, so there is no need to consider the above issues.

So how to avoid moving? Rust provides a special trait called Copy, which can be used to mark data types that are fully stored on the stack, such as integers. Once a type implements the Copy trait, when its variable is assigned to another variable, it can avoid moving and instead create a new instance through copying, keeping both the new and old variables usable.

Additionally, if a type itself or any member of that type implements the drop trait, Rust does not allow it to implement Copy. Drop and Copy cannot coexist. Attempting to add a Copy annotation to a type that requires special instructions to be executed when it goes out of scope will result in a compilation error. The following types in Rust implement Copy:

  • All integer types, such as u32;

  • The boolean type bool, which has only two values (true and false);

  • All floating-point types, such as f64;

  • The character type char;

  • If all fields in a tuple implement Copy, then that tuple also implements Copy;

Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

Functions

When calling functions, passing variables to functions will trigger move or copy behavior. For example:

fn main() {    let s = String::from("hello");    takes_ownership(s);    let x = 5;    makes_copy(x);
    println!("s = {s}");    // Error    println!("x = {x}");    // No error}
fn takes_ownership(some_string: String) {    println!("{some_string}");}
fn makes_copy(some_integer: i32) {    println!("{some_integer}");}

For s, calling takes_ownership causes a move, and the external s becomes invalid, resulting in an error when trying to output it. However, x is fine because x is an integer, which implements Copy, so it can still be used after the call. The functions in the above code have no return values, but ownership transfer also occurs during the return process.

fn main() {    let s1 = gives_ownership();    let s2 = String::from("hello");    let s3 = takes_and_gives_back(s2);
    println!("s1 = {s1}");    // No error    println!("s2 = {s2}");    // Error    println!("s3 = {s3}");    // No error}
fn gives_ownership() -> String {    let some_string = String::from("hello");    some_string}
fn takes_and_gives_back(a_string: String) -> String {    a_string}

The owner of “hello” moves from s2 to s3, making s2 invalid, so trying to output s2 results in an error. A method to use a value without losing its ownership is to return it as a return value, but this method is too cumbersome. Let’s look at another example:

fn main() {    let s1 = String::from("hello");    let (s2, len) = calculate_length(s1);
    println!("The length of '{s2}' is {len}.");}
fn calculate_length(s: String) -> (String, usize) {    let length = s.len();    (s, length)}

The above function wants to use String but also wants to continue using it later, so it returns it in the function. Ownership transfers to s2. To avoid this unnecessary hassle, we can use reference operations.

Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

References

Modification

fn main() {    let s1 = String::from("hello");    let len = calculate_length(&s1);
    println!("The length of '{s1}' is {len}.");}
fn calculate_length(s: &String) -> usize {    s.len()}

In the calculate_length function, s is a reference. When called in the main function, ownership does not transfer. When the function call ends, s goes out of scope but does not destroy the value it references because s does not own that value. “hello” will not be dropped.

When a function uses a reference instead of the value itself as a parameter, there is no need to return a value to return ownership, because in this case, we have not taken ownership at all.

This method is also known as borrowing. In real life, if a person owns something, you can borrow it from them, but when you are done using it, you must return it because you do not own it.

Attempting to modify the referenced value:

fn main() {    let s = String::from("hello");    change(&s);}
fn change(s: &String) {    s.push_str(", world");}

This code will not compile because references are immutable by default, and Rust does not allow modifying the value of a reference.

Changing an immutable reference to a mutable reference:

fn main() {    let mut s = String::from("hello");    change(&mut s);}
fn change(s: &mut String) {    s.push_str(", world");}

The above code can compile. Mutable references have a restriction: if you hold a mutable reference to a value, you cannot hold any other references to that value. The following code will not compile:

fn main() {    let mut s = String::from("hello");    let r1 = &mut s;    let r2 = &mut s;
    println!("{r1}, {r2}");}

This restriction helps avoid data races. Data races are very similar to race conditions and occur when the following three conditions are met:

  • Two or more pointers access the same space simultaneously;

  • At least one pointer writes data to the space;

  • There is no mechanism for synchronizing data access;

Data races can lead to undefined behavior, and Rust does not allow such situations to occur. A slight modification can create multiple mutable references:

fn main() {    let mut s = String::from("hello");    {        let r1 = &mut s;        println!("{r1}");    }
    let r2 = &mut s;
    println!("{r2}");}

Now let’s look at another scenario:

fn main() {    let mut s = String::from("hello");    let r1 = &s;    // No error    let r2 = &s;    // No error    let r3 = &mut s;    // Error
    println!("{r1}, {r2} and {r3}");}

The reason for the error is that you cannot create a mutable reference to the same data while holding an immutable reference.

Additionally, remember that the scope of a reference starts from the point of its creation and lasts until the last time it is used.

fn main() {    let mut s = String::from("hello");    let r1 = &s;    let r2 = &s;    println!("{r1} and {r2}");    // Variables r1 and r2 will no longer be used
    let r3 = &mut s;
    println!("{r3}");}

The above code can compile.

Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

Dangling References

Dangling references are similar to dangling pointers, which point to a memory address that once existed but has now been freed or repurposed. In Rust, the compiler ensures that references never enter such a dangling state.

Example:

fn main() {    let reference_to_nothing = dangle();}
fn dangle() -> &String {    let s = String::from("hello");    &s}

The above code will not compile. This code attempts to create a dangling reference. The variable s is created inside the function and will be destroyed when it goes out of scope. Attempting to return a reference to a non-existent s will raise an error. Rust does not allow such operations.

Remember the following two points about references:

  • At any given time, you can either have one mutable reference or any number of immutable references;

  • References are always valid;

Understanding Rust's Ownership MechanismUnderstanding Rust's Ownership Mechanism

Slices

Slices allow us to reference a contiguous sequence of elements within a collection rather than the entire collection. Since slices are also a reference, they do not hold ownership of the value.

fn main() {    let s = String::from("hello world");    let hello = &s[0..5];    let world = &s[6..11];    println!("{hello}, {world}");}

The slice ([starting_index..ending_index]) data structure internally stores a reference to the starting position and a field describing the length of the slice, which is equivalent to ending_index – starting_index. In the above example, world is a slice pointing to the 7th byte of variable s with a length of 5.

Understanding Rust's Ownership MechanismSlices have a small syntactic sugar: if you want to start from the first element, you can omit the starting_index. If you omit the ending_index, you can include the last byte of the String. If you omit both, you create a slice that points to all bytes of the entire string.Understanding Rust's Ownership Mechanism

– End –

Leave a Comment