Speed Comparison Between Rust and C

Author ｜ Kornel

Translator ｜ Sambodhi

Editor ｜ Zhao Yuying

This article was originally published on the author’s personal blog and is shared here with the authorization of the original author Kornel, translated by InfoQ Chinese site.

Programs written in Rust should have run-time speed and memory usage comparable to those written in C. However, due to the overall programming styles of these languages being different, their speeds are hard to generalize. This article summarizes the similarities between Rust and C, as well as the conditions under which C is faster and when Rust is faster.

Disclaimer: This article is not an objective benchmark; it merely reveals undeniable facts about these languages. There are significant differences between what these two languages can theoretically achieve and how they are used in practice. This particular comparison is based on my subjective experience, including deadlines, bugs, and laziness. I have been using Rust as my primary programming language for over 4 years, and I have also used C for about 10 years. In this article, I specifically compare Rust with C because there will be more “ifs” and “buts” compared to C++. I do not wish to delve into that.

In short:

The abstractions in Rust are a double-edged sword. They can hide bad code, but they also make it easier to improve algorithms and utilize highly optimized libraries.
I have never worried that using Rust would lead to a performance dead end. There is always an unsafe escape hatch for very low-level optimizations (which are often unnecessary).
Fearless concurrency truly exists. The occasional “clumsiness” of the borrow checker makes parallel programming practical.

My overall feeling is that if I could spend endless time and energy, my C programs would be as fast as Rust, if not faster, because theoretically, there is nothing that C can do that Rust cannot. However, in practice, C has less abstraction, a more primitive standard library, and terrible dependencies, and I really do not have the time to “reinvent the wheel” every time.

Similarities and Differences Between Rust and C

Both Are “Portable Assemblers”

Both Rust and C provide control over data structure layout, integer sizes, heap vs. stack memory allocation, and pointer indirection control. Generally, as long as the compiler inserts a bit of “magic,” it can be translated into understandable machine code. Rust even acknowledges that bytes have 8 bits and that signed integers can overflow!

While Rust has higher-level structures like iterators, traits, and smart pointers, these structures are designed to optimize directly into predictable machine code (that is, “zero-cost abstractions”). The memory layout of Rust types is straightforward; for example, a growable string and vector are simply {byte, capacity, length}. Rust does not have concepts like move or copy constructors, so passing objects is no more complex than passing pointers or using memcpy.

The borrow checker is merely a form of static analysis at compile time. It does nothing before code generation, and the lifetime information is completely stripped away. There are no clever practices like autoboxing.

An example of Rust not being a “stupid” code generator is unwinding. Although Rust does not use exceptions for normal errors, panic (unhandled fatal errors) can selectively appear as C++ exceptions. This can be disabled at compile time (panic = abort), but even so, Rust does not like to mix with C++ exceptions or longjmp.

Same Old LLVM Backend

Rust integrates very well with LLVM, supporting Link-Time Optimization (LTO), including ThinLTO, and even inlining across C/C++/Rust language boundaries, as well as profile-guided optimization. Although the LLVM IR generated by rustc is much more verbose than clang, the optimizer handles it well.

When compiling with GCC, some of my C code can be faster than LLVM, and GCC does not have a Rust front end, while Rust does not achieve that.

Theoretically, Rust allows for better optimizations than C due to stricter immutability and aliasing rules, but in practice, this has not yet happened. For LLVM, optimizations other than C are ongoing, so Rust has not yet fully realized its potential.

Both Allow Manual Tuning with Few Exceptions

Rust code is low-level and predictable, allowing me to manually tune the optimized assembly. Rust supports SIMD and provides good control over inlining, calling conventions, etc. The Rust language is so similar to C that C profilers can usually be used with Rust (for example, I can use Xcode’s tools on a Rust-C-Swift sandwich program).

Generally speaking, there is not much difference between optimizing Rust and C when performance is absolutely critical and needs to be hand-tuned to the last detail.

However, there are some low-level features for which Rust does not have suitable alternatives:

Computed goto. The “boring” use of goto can be replaced by other Rust constructs like loop{break}. Many goto usages in C are for cleanup, and Rust does not need cleanup due to RAII/destructors. However, there is a non-standard goto *addr extension for interpreters. Rust cannot execute this directly (you can write a match and hope it optimizes), but on the other hand, if I need an interpreter, I would try to use Cranelift JIT instead.
Alloca and C99 variable-length arrays. They are controversial even in C, so Rust does not use them.

Rust’s Minor Overhead

However, if Rust does not perform manual tuning, some inefficiencies can arise:

Rust lacks implicit type conversion for indexing with usize, which encourages users to only use that type, even when smaller types would suffice. In stark contrast to C, where a 32-bit int is the most popular choice. Indexing with usize is easier to optimize on 64-bit platforms without relying on undefined behavior, but the extra bits put more pressure on registers and memory.
Idiomatic Rust always passes pointers and sizes to strings and slices. Before porting several C libraries to Rust, I did not realize how many C functions merely use a pointer to memory without any size, hoping for optimal results (these sizes can be indirectly known from context or merely assumed to be sufficient to perform the task).
Not all boundary checks are optimized. For items in arr or arr.iter().for_each(…), the performance is as efficient as possible, but if the form of i is in 0..len {arr[i]}, then performance depends on whether the LLVM optimizer can prove length matching. Sometimes this is not possible, and bounds checks can inhibit autovectorization. There are various workarounds, of course, some safe and some unsafe.
“Smart” use of memory is not popular in Rust. In C, anything goes. For example, in C, I might try to reuse a buffer allocated for one purpose for another (this technique is called HEARTBLEED). For variable-sized data (like PATH_MAX), using a fixed-size buffer is convenient to avoid (re)allocating a constantly growing buffer. Idiomatic Rust still has significant control over memory allocation, allowing for basic things like memory pools, merging multiple allocations into one, preallocating space, etc., but overall, it tends to guide users toward “boring” usage of memory.
If the borrowing check rules make things difficult, a simple workaround is to perform extra copies or use reference counting. Over time, I learned many tricks about the borrow checker and adjusted my coding style to better fit the borrow checker, so this situation has become rare. It will never be a major issue because there is always a fallback to the “raw” pointer when necessary.

The borrow checker in Rust is notoriously disliked for doubly linked lists, but fortunately, linked lists run very slowly on current hardware (poor cache locality and no vectorization). Rust’s standard library provides linked lists and faster containers better suited for the borrow checker.

There are two situations that the borrow checker cannot tolerate: memory-mapped files (the magical changes from outside the process conflict with the immutability of references ^ exclusivity semantics) and self-referential structures (which can leave internal pointers dangling by passing structures by value). This can be solved with raw pointers, which are as safe as any pointer in C, or through mental gymnastics to abstract the safety of these pointers.

In Rust, single-threaded programs simply do not exist as a concept. To improve performance, Rust allows the use of a single data structure while ignoring thread safety, but anything that allows sharing between threads (including global variables) must be synchronized or marked as unsafe.

Rust’s string support includes some cheap in-place operations, such as make_ascii_lowercase() (directly equivalent to operations in C), while .to_lowercase() does not require copying in a Unicode-aware manner. Speaking of strings, UTF-8 encoding is not as troublesome as it seems, as strings have a .as_bytes() view, so they can be handled in a Unicode-ignorant way if needed.

libc does its best to make stdout and putc relatively fast. Rust’s libstd is not as magical, so unless wrapped with BufWriter, it does not buffer I/O. Some people complain that Rust is slower than Python, which is because Rust spends 99% of the time flushing results byte by byte, which is exactly what we are talking about.

Executable File Size

Every operating system comes with some standard C libraries, which provide about 30MB of code that C executables get “for free.” For example, a tiny “Hello World” C executable cannot actually output anything; it just calls the printf that comes with the OS. Rust cannot expect the OS to have Rust’s standard library built-in, so Rust executables bundle their own standard library (over 300KB). Fortunately, this is a one-time overhead that can be reduced. In embedded development, the standard library can be turned off, and Rust will generate “bare” code.

The size of Rust code is comparable to that of each function in C, but there is the issue of “generics bloat.” For each type, there will be an optimized version of the generic function, so it is possible that the same function ends up with 8 versions; the cargo-bloat tool can help identify them.

Using dependencies in Rust is very simple. Similar to JS/npm, there is a culture of making small, single-purpose libraries, but they do combine. Ultimately, all my executables include Unicode normalization tables, 7 different random number generators, and an HTTP/2 client that supports Brotli. The cargo-tree is very useful for deduping and removing data.

Rust’s Small Wins

In discussing overhead, I have mentioned many, but Rust also has some areas where it is ultimately more efficient and faster:

To hide implementation details, C libraries often return pointers to opaque data structures and ensure that each instance of the structure has only one copy. This incurs the costs of heap allocation and pointer indirection. Rust’s built-in privacy, single ownership rules, and coding conventions allow libraries to expose their objects without indirection, enabling callers to decide whether to place them on the heap or stack. This allows for proactive or complete optimization of stack objects.
By default, Rust can inline functions from the standard library, dependencies, and other compilation units. For C, I sometimes hesitate to split files or use libraries because it affects inlining and requires fine-grained management of header and symbol visibility.
Reordering struct fields reduces structure padding. When compiling C with -Wpadding, it shows how often I forget this detail.
The size of strings is encoded in their “fat” pointers. This makes length checks fast, avoids accidental O(n²) string loops, and allows for in-place generation of substrings (for example, splitting a string into tokens) without modifying memory or copying to add a
terminator.
Similar to C++ templates, Rust generates copies of the generic code for each type they are used with, so functions like sort() and containers like hash tables are always optimized for their types. For C, I have to choose between modifying macros or dealing with void* and functions with poor efficiency for runtime variable sizes.
Rust iterators can be chained together and optimized as a unit. Therefore, compared to calling ibuy(it); use(it); break(it); change(it); mail(upgrade(it)); a series of operations that might rewrite the same buffer multiple times, I prefer to call it like it.buy().use().break().change().upgrade().mail(), which compiles into buy_use_break_change_mail_upgrade(), completing all these operations in one combined pass. (0..1000).map(|x| x*2).sum() compiles to return 999000.
Additionally, there are reading and writing interfaces that allow functions to stream unbuffered data. They combine well, so I can write data to a stream that dynamically calculates the CRC of the data, adds framing/escaping if needed, compresses the data, and writes it to the network, all in one call. At the same time, I can pass such a combined stream as an output stream to my HTML template engine, so now every HTML tag is smart enough to send compressed data. The underlying mechanism is like a pyramid of ordinary next_stream.write(bytes) calls, so technically, nothing prevents me from doing the same in C, but the lack of features and generics in C makes it practically difficult, and other efficiencies are not high unless setting callbacks at runtime.
For C, it is entirely reasonable to overuse linear searches and linked lists because who wants to maintain another half-baked hash table implementation? With no built-in containers, dependencies can be quite troublesome, so I cut corners to get the job done. I would not write a complex B-tree implementation unless absolutely necessary. I would use qsort + bisect and call it a day. In Rust, on the other hand, implementing various containers takes only 1 or 2 lines of code, and their quality is very high. This means my Rust programs can use appropriate, incredible, well-optimized data structures every time.
Nowadays, it seems everything requires JSON. Rust’s serde is one of the fastest JSON parsers in the world, capable of directly parsing into Rust structures, making the use of parsed data very fast and efficient.

Rust’s Major Wins

Even in third-party libraries, Rust enforces thread safety for all code and data, even if the authors of that code did not pay attention to thread safety. Everything follows a specific thread safety guarantee or cannot be used across threads. When the code I write does not comply with thread safety, the compiler accurately points out the unsafe parts.

This is in stark contrast to the situation in C. Generally, unless library functions have explicit documentation stating otherwise, they cannot be trusted to be thread-safe. Programmers must ensure that all code is correct, and the compiler is often powerless in this regard. Multithreaded C code carries more responsibility and risk, so pretending that multi-core CPUs are fashionable and imagining that users have better things to do with the remaining 7 to 15 cores is very appealing.

Rust guarantees immunity from data races and memory unsafety (e.g., use-after-free bugs, even across threads). It is not just some races that can be discovered at runtime through heuristics or tools; all data races can be detected. It is a lifesaver because data races are the worst kinds of parallel errors. They can happen on my users’ machines and not in my debugger. There are also other types of concurrency errors, such as improper use of locking primitives leading to higher-level logical race conditions or deadlocks, which Rust cannot eliminate, but they are generally easier to reproduce and fix.

I would not dare to use more OpenMP utilities on simple for loops in C. I have tried to take more risks with tasks and threads, but the results have always been disappointing.

Rust already has many libraries, such as data parallelism, thread pools, queues, tasks, lock-free data structures, etc. With the help of such components and the powerful safety net of the type system, I can easily parallelize Rust programs. In some cases, using par_iter() instead of iter() is possible, and as long as it compiles, it works fine! This does not always lead to linear speedup (Amdahl’s law is cruel), but often a relatively small amount of work can achieve a 2-3x speedup.

Extension: Amdahl’s Law, a rule of thumb in computer science, named after Gene Amdahl, represents the potential for efficiency improvement after parallel computing of processors.

In terms of documenting thread safety, Rust and C have an interesting difference. Rust has a vocabulary for describing specific aspects of thread safety, such as Send and Sync, guards, and cells. For C libraries, there is no such terminology: “You can allocate it on one thread and free it on another, but cannot use it simultaneously from two threads.” Depending on the data type, Rust describes thread safety, which can be generalized to all functions using them. For C, thread safety only concerns individual functions and configuration flags. Rust’s guarantees are often provided at compile time, at least unconditionally. For C, it is common to say, “It is thread-safe only if the turboblub option is set to 7.”

Conclusion

Rust is low-level enough that it can optimize like C for maximum performance if necessary. The higher the level of abstraction, the easier memory management becomes, and the richer the available libraries, the more code Rust programs have, and the more they can do. However, if not controlled, this can lead to program bloat. Nevertheless, Rust programs also optimize quite well, sometimes outperforming C, which is suitable for writing minimal code at the byte and pointer level, while Rust has powerful capabilities to effectively combine multiple functions or even entire libraries.

However, the greatest potential is the ability to fearlessly parallelize most Rust code, even when the equivalent C code carries a high risk of parallelization. In this regard, Rust is a more mature language than C.

Author Bio:

Kornel, programmer, specializes in image compression. Enjoys chatting. Blog writer.

Original Link:

https://kornel.ski/rust-c-speed

Click at the end of the article [Read Original] to visit InfoQ’s official website for more exciting content!

Today’s Recommended Articles

90 Billion Dollar Java Dispute Reversal: Code Copied in Android Falls Under Fair Use

Can Java Microservices Be as Fast as Go?

Is It Possible to Rewrite the Linux Kernel in Rust?

Weekly Highlights Launched on Mobile, Subscribe Now, You Will Get

The essential collection of InfoQ users’ weekly must-read content:

Global IT news written or compiled by senior technical editors;

Practical technical cases written by frontline technical experts;

Courses and technical event registration channels produced by InfoQ;

“Code” up, subscribe to Weekly Fresh Information

Click to see fewer bugs👇

Related posts

Leave a Comment Cancel reply