Sudoke X Rust: Insights and Innovations

The 2nd China Rust Developer Conference (Rust China Conf 2021—2022) is about to start. Sudoke’s engineer Wei Xiaoya will share insights on building a high-performance privacy computing platform using Rust during the “Cryptography, Privacy, and Trusted Computing” session.

Wei Xiaoya is one of the leaders of Sudoke’s Multi-Party Secure Computing Platform, and before joining Sudoke, he worked in Silicon Valley, specializing in distributed systems and big data processing. He was responsible for elastic storage and stream computing for video streaming at Google Cloud and worked on graph databases and knowledge graphs at Airbnb.

Before the conference, we asked Xiaoya a series of quick questions about “Sudoke and Rust”: Why did we choose Rust to reinvent distributed computing for privacy computing? What problems did we encounter during practical engineering? What do we love about Rust, and where do we face challenges?

Why did we choose Rust to reinvent the wheel?

X:This requires a brief discussion on federated learning and multi-party secure computing. Traditional machine learning relies on training data to create models for predictions, known as “garbage in, garbage out”; the quality of the model largely depends on the quality of the training samples. However, in real-world scenarios, the data owned by a single organization is often insufficient to train a good model. This necessitates the use of federated learning technology to combine data from multiple organizations.

For example, Company A and Company B each hold their own data. Due to legal compliance or data privacy protection reasons, they cannot exchange data, but they still need to train a model using their combined data. In this case, federated learning technology allows both parties to train models using their own data while communicating securely and exchanging intermediate model training results without revealing the original data. After completing the model training, both parties obtain parts of their respective models, which together form a complete model, achieving the same effect as training the model with combined data. This method allows for model training without the original data leaving the premises.

This requires multiple parties to collaborate, without a central node acting as a scheduler throughout the computation process. Additionally, it relies on encrypted computations, involving the use of various cryptographic techniques, resulting in a computational load far exceeding that of plaintext computations. In practical implementations, federated learning and privacy computing, like traditional machine learning, face challenges with large data volumes. In this context, we need to consider how to horizontally scale computational capacity to accommodate large data volumes.

The engineering technology for big data has developed rapidly over the past 20 years, but the same cannot be said for the field of privacy computing. When I first entered the industry, I found that the engineering implementations in privacy computing seemed to have regressed to 20 years ago, with almost no exposure to technologies like high availability and distributed computing that I had previously encountered. The industry’s engineering technology implementations are still largely at the monolithic architecture stage, with computations occurring within a single computing node.

In today’s era of massive data accumulation, Sudoke recognizes that handling large data volumes is an inevitable trend in the implementation of privacy computing projects. Thus, we began to consider implementing distributed computing in the field of privacy computing and even contemplated whether we could leverage existing distributed computing frameworks, such as Spark, for secondary development to fit privacy computing scenarios.

Ultimately, the reasons for deciding to reinvent the wheel are as follows: First, cost: all open-source big data frameworks have a fundamental premise that there is a central scheduling node for computations occurring within a data center. However, privacy computing is a heterogeneous multi-party collaboration process without a central node. Modifying an existing distributed computing framework would mean overturning this assumption and making changes from a very low level; furthermore, some of the cleverly efficient designs in open-source frameworks based on this premise may become difficulties in our secondary development. Therefore, reinventing the wheel is a more reasonable choice for our specific scenario.

Additionally, due to the extensive encrypted computations involved, the computational load of privacy computing far exceeds that of plaintext computations. Most existing open-source frameworks are written in JVM languages like Java or Scala, which have poorer computational performance and memory management compared to Rust, allowing us to write efficient code while ensuring memory safety. We also need to utilize cryptographic libraries written in C, and compared to Java, Rust has advantages in C language bindings. Given these circumstances, using Rust is relatively more convenient.

What do we love about Rust?

X:Excellent performance; flexibility; can fully utilize machine capabilities

First of all, Rust provides excellent performance while ensuring memory safety relatively easily. I believe everyone has heard praises about Rust’s performance multiple times. I have additional thoughts on this for reference. On one hand, Rust allows developers to write efficient code using relatively simple methods: for example, using the async keyword to implement asynchronous functions and smart pointers to pass addresses.

On the other hand, in some specific scenarios, Rust allows us to further maximize its performance based on a deeper understanding of its principles. For instance, when implementing asynchronous functions, we can directly implement the Future trait, finely controlling every step of the asynchronous execution logic to push performance to the extreme; in C++, to ensure memory safety, we generally avoid passing raw pointers. However, Rust’s lifetime mechanism allows us to avoid the extra overhead of using smart pointers while ensuring memory safety. Additionally, Rust’s ownership mechanism for memory management is novel and efficient, helping developers avoid extra overhead like garbage collection while also preventing inefficient code such as redundant memory copies, thereby improving code quality and significantly reducing runtime debugging time.

Finally, compared to C++, Rust’s syntax is modern and unified, making development enjoyable, and its performance is on par with or even superior to C++. Compared to C++, Rust’s dependency management and compilation toolchain are also very user-friendly. Its syntax features like macros, procedural macros, and traits make Rust development flexible and concise, and its C binding FFI is relatively well-developed.

In summary, compared to C++, Rust provides developers with a better option that balances performance, safety, development efficiency, and even enjoyment in development; in performance-critical development scenarios, my first choice will be Rust rather than C++.

Where is the room for improvement in Rust?

X:Slow maintenance progress; syntax features need improvement

In the past couple of years of using Rust, I have indeed felt that there are areas for improvement. C++ has been around for a long time and has undergone long-term iterations, making it relatively complete. Rust, on the other hand, is newer and maintained by the open-source community, leading to slower progress in many cases. There are numerous bugs on GitHub that have gone unfixed for years, and many RFCs have yet to be completed. Furthermore, the ecosystem related to Rust is not very complete. For example, while projects like gRPC and Apache Arrow have Rust implementations, many are community-contributed, and compared to C++, many performance metrics are lacking. We have even heard of some companies directly using C bindings to utilize C++-implemented gRPC libraries in Rust, which reflects this issue indirectly.

Rust’s syntax is still not as comprehensive and powerful as C++. We feel pain points particularly in templates, especially template specialization. Rust has basic support for template specialization, but the applicable scenarios are limited, and it lacks support for partial specialization altogether, let alone SFINAE, which forces us to devise many workarounds to achieve functionalities that can be easily accomplished in C++.

Lastly, while Rust’s lifetime mechanism ensures memory safety, it can sometimes be overly conservative, and the readability of compiler error messages is lacking, requiring developers to spend a lot of time repeatedly modifying code. Additionally, Rust currently lacks a friendly development environment, and its learning curve is quite steep, making it challenging to find talent in the market. Our recruitment of Rust engineers is quite difficult.

If you are interested in Rust or privacy computing, welcome to join Sudoke and explore cutting-edge and cool work, using technology to shape a new field.

HR WeChat ID

Sudokeji

Email

[email protected]

RECOMMEND

Related posts

Leave a Comment Cancel reply