Scientists Decomposing C Language for Rust Code Conversion

Scientists Decomposing C Language for Rust Code ConversionScientists Decomposing C Language for Rust Code Conversion

Introduction: Computer scientists from the French National Institute for Research in Computer Science and Automation (Inria) and Microsoft have designed a method to automatically convert a subset of C code into safe Rust code to meet the growing demand for memory safety.

The C programming language was born in the early 1970s and is used to build many critical systems, applications, and libraries. For example, the Linux kernel is primarily written in C.

However, like its extension C++, C is not designed for memory safety. It uses manual memory management, which is more efficient and flexible than automatic memory management (such as garbage collection), but is also more prone to memory errors.

Recent studies have shown that memory safety errors (such as out-of-bounds reads and writes and use-after-free) account for the majority of software vulnerabilities. In Google’s Android operating system, they accounted for 76% of vulnerabilities in 2019, and the Chocolate Factory predicts that by the end of 2024, this figure will drop to 24% through the use of Rust and safe coding practices.

Rust code can be written as memory-safe or unsafe applications, which is up to the developers. While C and C++ code can be made more memory-safe through diligence, static analysis, and testing, neither of these programming languages provides out-of-the-box memory safety guarantees.

In recent years, the industry and government have been working hard to reduce the use of C and C++ code and increase the use of memory-safe programming languages such as Rust, Go, Python, and Java (though these languages may link to unsafe libraries). As stated by the Internet Security Research Group (ISRG) Prossimo project: “Using C and C++ is harmful to society, harmful to your reputation, and harmful to your customers.”

Not everyone goes this far. Many C and C++ programmers prefer to find ways to continue using their favorite tools rather than join the Rust wave, and even large companies like Google, one of Rust’s most outspoken advocates, acknowledge that C and C++ code will be around for a long time.

Thus, significant efforts have been made to develop methods that make C and C++ more memory-safe and to develop automatic code conversion mechanisms like DARPA’s TRACTOR program.

Efforts like TrapC and Fil-C make C memory-safe, but they also have their downsides. TrapC is still in development and focuses on a subset of the language. Fil-C currently impacts performance and lacks full application binary interface (ABI) compatibility.

In a paper titled “Compiling C to Safe Rust, Formally,” authors Aymeric Fromherz (Inria) and Jonathan Protzenko (Microsoft) provide an alternative approach to automatically converting C to Rust that produces unsafe Rust. Their focus is on providing a conversion path for formally verified industrial-grade code.

“C allows programmers to be creative with aliasing, low-level casts, memory management, and data representation,” the authors explain. “Expressing these patterns in Rust requires giving up many static guarantees to allow unchecked aliasing, coercions between representations (also known as ‘casting’ in Rust terminology), which are realized through Rust’s unsafe features. But doing so undermines Rust’s advantages!”

Thus, Fromherz and Protzenko developed a subset of C called “Mini-C,” which avoids common C patterns and features, such as pointer arithmetic and implicit mutability, that cannot be directly converted to safe Rust.

By implementing Mini-C using the KaRaMeL compiler framework, the scientists state that their method can generate safe Rust code, although it should be noted that some refactoring may be required.

They explain: “We do not automatically convert the full generality of C into unsafe Rust and attempt to make the generated code safer; instead, we aim at a data-centric subset of C applications. Therefore, our translation process is semi-automated: users may need to make minimal adjustments to the source C program to fit within the supported language subset; once into this subset, our method automatically generates valid, safe Rust code.”

They tested the conversion process on HACL* (High-Assurance Cryptography Library), which consists of 80,000 lines of verified C code, and on EverParse, a verified formatter and serializer library consisting of 1,400 lines of C code.

The HACL* conversion required minimal code changes, while the EverParse conversion could be completed without any changes to the source code. They claim that the results are excellent – despite the addition of fat pointers and runtime bounds checks, the Rust code exhibits the same performance characteristics as the original C code.

This work now contributes to making various dependent applications more secure. The authors state that their Rust-compiled HACL* has been packaged in the libcrux cryptography library, parts of which have been added to Mozilla’s NSS and OpenSSH.

Author: Listening to Music Fish

Related Reading:

  • Fish Shell 4.0: The Transformation Journey from C++ to Rust

  • SQLite Reimplemented in Rust for Async I/O and Other Changes

  • How Rust Will Change Robotics?

Leave a Comment