Comparison of Speed Between Rust and C

Author | Kornel Translator | Sambodhi Planner | Zhao Yuying

This article was originally published on the author’s personal blog and is translated and shared by InfoQ Chinese site with the authorization of the original author Kornel.

Programs written in Rust should have runtime speeds and memory usage comparable to those written in C. However, due to the overall programming styles of these languages being different, it is difficult to generalize their speeds. This article summarizes the similarities between Rust and C, as well as the scenarios where C is faster and where Rust is faster.

Disclaimer: This article is not an objective benchmark; it merely reveals the undeniable facts about these languages. There are significant differences in what these two languages can theoretically achieve and how they are used in practice. This particular comparison is based on my personal subjective experience, including delivery deadlines, bugs, and laziness. I have been using Rust as my primary programming language for over 4 years, while I have used C for nearly 10 years. I specifically compare Rust with C because there would be more “ifs” and “buts” compared to C++, and I do not wish to delve into that.

In short:

The abstractions in Rust are a double-edged sword. They can hide bad code but also make it easier to improve algorithms and utilize highly optimized libraries.
I have never worried about getting stuck in a performance dead end with Rust. There is always an unsafe escape hatch for very low-level optimizations (which are often unnecessary).
Fearless concurrency does exist. The occasional “clumsiness” of the borrow checker makes parallel programming practical.

My overall feeling is that if I could spend endless time and energy, my C programs would be as fast as Rust, or even faster, because theoretically, there is nothing that C can do that Rust cannot. However, in practice, C has less abstraction, a more primitive standard library, and a terrible dependency situation, and I really do not have the time to “reinvent the wheel” every time.

Similarities and Differences Between Rust and CBoth are “portable assemblers”

Both Rust and C provide control over data structure layout, integer sizes, heap and stack memory allocation, and pointer indirection, generally allowing the compiler to insert a bit of “magic” to translate into understandable machine code. Rust even acknowledges that bytes are 8 bits and that signed integers may overflow!

While Rust has higher-level constructs like iterators, traits, and smart pointers, these constructs are designed to optimize directly into predictable machine code (i.e., “zero-cost abstractions”). The memory layout of Rust types is simple; for example, a growable string and vector are just {byte, capacity, length}. Rust does not have concepts like move or copy constructors, so passing objects is no more complex than passing pointers or using memcpy.

The borrow checker is merely a form of static analysis at compile time. It does nothing before code generation, and the lifetime information is completely stripped away. There are no clever practices like autoboxing.

An example of Rust not being a “stupid” code generator is unwinding. Although Rust does not use exceptions for normal errors, panics (unhandled fatal errors) can selectively appear in the form of C++ exceptions. This can be disabled at compile time (panic = abort), but even so, Rust does not like to mix with C++ exceptions or longjmp.

Same Old LLVM Backend

Due to Rust’s excellent integration with LLVM, it supports Link-Time Optimization (LTO), including ThinLTO, and even supports inlining across C/C++/Rust language boundaries, as well as profile-guided optimization. Although the LLVM IR generated by rustc is much more verbose than clang, the optimizer handles it well.

Some of my C code is faster when compiled with GCC than with LLVM, and GCC does not have a Rust frontend, while Rust does not achieve this.

Theoretically, Rust allows for better optimization than C due to its stricter immutability and aliasing rules, but in practice, this has not yet happened. For LLVM, optimizations outside of C are ongoing, so Rust has not yet fully realized its potential.

Both Allow Manual Tuning with Few Exceptions

Rust code is low-level and predictable, allowing me to manually tune the optimized assembly. Rust supports SIMD and provides good control over inlining, calling conventions, etc. Rust is so similar to C that C profilers can often be used with Rust (for example, I can use Xcode’s tools on a Rust-C-Swift sandwich program).

Generally, there is not much difference between optimizing Rust and optimizing C when performance is absolutely critical and needs to be hand-tuned to the last detail.

There are some low-level features for which Rust does not have suitable alternatives:

Computed goto. The “boring” uses of goto can be replaced by other Rust constructs like loop { break }. Many uses of goto in C are for cleanup, and Rust does not need cleanup due to RAII/destructors. However, there is a non-standard goto *addr extension that can be used for interpreters. Rust cannot execute it directly (you can write a match and hope it optimizes), but on the other hand, if I need an interpreter, I would try to use Cranelift JIT instead.
alloca and C99 variable-length arrays. They are even controversial in C, so Rust does not use them.

Rust’s Minor Overhead

However, if Rust is not manually tuned, some inefficiencies can arise:

Rust lacks implicit type conversion for indexing with usize, which encourages users to only use that type, even when smaller types would suffice. In contrast to C, where 32-bit int is the most popular choice, indexing with usize is easier to optimize on 64-bit platforms without relying on undefined behavior, but the extra bits put more pressure on registers and memory.
Idiomatic Rust always passes pointers and sizes to strings and slices. Before porting several C libraries to Rust, I had not realized how many C functions simply use a pointer to memory without any size and expect optimal results (these sizes can be indirectly known from context or simply assumed to be sufficient for the task).
Not all bounds checks are optimized. for item in arr or arr.iter().for_each(…) are as efficient as possible, but if i in 0..len { arr[i] } is required, performance depends on whether the LLVM optimizer can prove the length matches. Sometimes it cannot, and the bounds checks can inhibit autovectorization. There are various workarounds, of course, both safe and unsafe.
“Smart” memory usage is not favored in Rust. In C, anything goes. For example, in C, I might try to reuse a buffer allocated for one purpose for another (this technique is called HEARTBLEED). For variable-sized data (like PATH_MAX), using a fixed-size buffer is convenient to avoid (re)allocating a constantly growing buffer. Idiomatic Rust still has a lot of control over memory allocation and can do basic things like memory pools, merging multiple allocations into one, preallocating space, etc., but overall, it guides users towards “boring” usage or memory.
If the borrow checker rules make things difficult, a simple workaround is to do extra copying or use reference counting. Over time, I have learned many tricks about the borrow checker and adjusted my coding style to better fit the borrow checker, so this situation has become rare. It will never become a big issue because there is always a fallback to the “raw” pointer when necessary.

Rust’s borrow checker is notoriously disliked for doubly linked lists, but fortunately, linked lists run very slowly on current hardware (poor cache locality and no vectorization). Rust’s standard library provides linked lists, as well as faster and more borrow-checker-friendly containers to choose from.

There are two situations that the borrow checker cannot tolerate: memory-mapped files (the magical changes from outside the process conflict with the immutability of references ^ exclusive semantics) and self-referential structures (which can leave internal pointers dangling by passing structures by value). This can be solved with raw pointers, which are as safe as any pointer in C, or by abstracting these pointers safely through mental gymnastics.

In Rust, single-threaded programs simply do not exist as a concept. To improve performance, Rust allows the use of a single data structure while ignoring thread safety, but anything that allows sharing across threads (including global variables) must be synchronized or marked as unsafe.

Rust’s string support includes some cheap in-place operations, such as make_ascii_lowercase() (directly equivalent to operations in C), while .to_lowercase() does not need to use a Unicode-aware way for copying. Speaking of strings, UTF-8 encoding is not as troublesome as it seems, as strings have an .as_bytes() view, so if needed, they can be handled in a Unicode-ignorant way.

libc does its best to make stdout and putc quite fast. Rust’s libstd is not so magical, so unless wrapped with BufWriter, it does not buffer I/O. Some people complain that Rust is slower than Python because Rust spends 99% of its time flushing results byte by byte, which is exactly what we are talking about.

Executable File Size

Every operating system comes with some standard C libraries, which provide about 30MB of code that C executables get “for free”; for example, a small “Hello World” C executable cannot actually output anything; it just calls the printf that comes with the operating system. Rust cannot expect the operating system to have Rust’s standard library built-in, so Rust executables bundle their own standard library (over 300KB). Fortunately, this is a one-time overhead that can be reduced. In embedded development, the standard library can be turned off, and Rust will generate “bare” code.

The size of Rust code is comparable to the size of each function in C, but there is a problem of “generics bloat”. For each type, there will be an optimized version of the generic function, so it is possible for the same function to end up with 8 versions; the cargo-bloat tool can help find them.

Using dependencies in Rust is very simple. Similar to JS/npm, there is a culture of making small single-purpose libraries, but they do tend to be combined. Ultimately, all my executables include Unicode normalization tables, 7 different random number generators, and an HTTP/2 client that supports Brotli. The cargo-tree is very useful for deduping and removing data.

Rust’s Small Wins

In discussing overhead, I have mentioned many things, but there are also some areas where Rust ultimately proves to be more efficient and faster:

To hide implementation details, C libraries often return opaque data structure pointers and ensure that each instance of the structure has only one copy. This incurs the costs of heap allocation and pointer indirection. Rust’s built-in privacy, single ownership rules, and coding conventions allow libraries to expose their objects without needing indirection, allowing callers to decide whether to place them on the heap or stack. This can actively or thoroughly optimize stack objects.
By default, Rust can inline functions from the standard library, dependencies, and other compilation units. For C, I sometimes hesitate to split files or use libraries because it affects inlining and requires fine management of header and symbol visibility.
Reordering struct fields reduces padding in data structures. When compiling C with -Wpadding, it shows how often I forget this detail.
The size of strings is encoded in its “fat” pointer. This makes length checks fast, avoids accidental O(n²) string loops, and allows for in-place generation of substrings (e.g., splitting a string into tokens) without modifying memory or copying to add a \0 terminator.
Similar to C++ templates, Rust generates copies of generic code for each type it uses, so functions like sort() and containers like hash tables are always optimized for their types. For C, I have to choose between modifying macros or dealing with the inefficiency of functions that handle void* and runtime variable sizes.
Rust iterators can be chained together and optimized as a single unit. Therefore, instead of calling ibuy(it); use(it); break(it); change(it); mail(upgrade(it)); which may rewrite the same buffer multiple times, I prefer to call it.buy().use().break().change().upgrade().mail(), which compiles into buy_use_break_change_mail_upgrade(), completing all these operations in a single combined pass after optimization. (0..1000).map(|x| x*2).sum() compiles to return 999000.
Additionally, there are read and write interfaces that allow functions to stream unbuffered data. They combine well, so I can write data to a stream that can dynamically compute the CRC of the data, add framing/escaping if needed, compress the data, and then write it to the network, all in a single call. I can also pass such a combined stream as an output stream to my HTML template engine, so now every HTML tag is smart enough to send compressed data. The underlying mechanism is like a pyramid of ordinary next_stream.write(bytes) calls, so technically, nothing prevents me from doing the same in C, but the lack of features and generics in C means it is practically difficult to achieve, and other than setting callbacks at runtime, the efficiency is not high.
In C, it is perfectly reasonable to overuse linear searches and linked lists because who wants to maintain yet another half-baked hash table implementation? The lack of built-in containers makes dependencies very troublesome, so I cut corners to get the job done. I would not write a complex B-tree implementation unless absolutely necessary. I would use qsort + bisect and call it a day. In Rust, on the other hand, it takes only 1 to 2 lines of code to implement various containers of very high quality. This means my Rust programs can always use appropriate, incredible, optimized data structures.
Nowadays, it seems everything requires JSON. Rust’s serde is one of the fastest JSON parsers in the world, capable of directly parsing into Rust structures, making the use of parsed data very fast and efficient.

Rust’s Major Wins

Even in third-party libraries, Rust enforces thread safety for all code and data, even if the authors of that code did not pay attention to thread safety. Everything follows a specific thread safety guarantee, or it is not allowed to be used across threads. When my code does not conform to thread safety, the compiler accurately points out the unsafe areas.

This is in stark contrast to the situation in C. Generally, unless library functions have explicit documentation stating otherwise, they cannot be trusted to be thread-safe. Programmers need to ensure that all code is correct, and the compiler is often powerless in this regard. Multithreaded C code carries more responsibility and risk, so pretending that multi-core CPUs are a fashion and imagining that users have better things to do with the remaining 7 to 15 cores is very appealing.

Rust guarantees immunity from data races and memory unsafety (e.g., use-after-free bugs, even across threads). Not only can some races be discovered through heuristic methods or tools built at runtime, but all data races can be detected. It is a lifesaver because data races are the worst kind of parallel errors. They can occur on my users’ machines and not in my debugger. There are also other types of concurrency errors, such as improper use of lock primitives leading to higher-level logical race conditions or deadlocks; Rust cannot eliminate these errors, but they are usually easier to reproduce and fix.

I would not dare to use more OpenMP utilities in simple for loops in C. I have tried to take more risks with tasks and threads, but the results have always been disappointing.

Rust already has many libraries for data parallelism, thread pools, queues, tasks, lock-free data structures, etc. With the help of such components, combined with the powerful safety net of the type system, I can easily parallelize Rust programs. In some cases, replacing iter() with par_iter() is possible, and as long as it compiles, it works! This is not always linear speedup (Amdahl’s law is cruel), but often a relatively small amount of work can achieve a 2-3x speedup.

Extension: Amdahl’s Law, a rule of thumb in computer science named after Gene Amdahl, represents the potential for efficiency improvement after parallelizing computations.

In terms of documenting thread safety, Rust and C have an interesting difference. Rust has a vocabulary for describing specific aspects of thread safety, such as Send and Sync, guards, and cell. For C libraries, there is no such terminology: “You can allocate it on one thread and free it on another, but you cannot use it from two threads simultaneously.” Depending on the data type, Rust describes thread safety, which can generalize to all functions that use them. For C, thread safety only concerns individual functions and configuration flags. Rust’s guarantees are often provided at compile time, at least unconditionally. For C, it is common to say, “This is thread-safe only if the turboblub option is set to 7.”

Conclusion

Rust is low-level enough that it can be optimized like C for maximum performance if necessary. The higher the level of abstraction, the more convenient memory management becomes, and the richer the available libraries, the more Rust program code there is, and the more things it can do, but without control, it can lead to program bloat. However, Rust programs are also well-optimized, sometimes even better than C, which is suitable for writing minimal code at the byte and pointer level, while Rust has powerful capabilities to effectively combine multiple functions or even entire libraries.

However, the greatest potential lies in the ability to fearlessly parallelize most Rust code, even when the equivalent C code carries significant risks of parallelization. In this regard, Rust is a more mature language than C.

Author Bio:

Kornel, a programmer specializing in image compression. Enjoys chatting. Blog writer.

Original Link:

https://kornel.ski/rust-c-speed

Are you also “watching”? 👇

Related posts

Leave a Comment Cancel reply