Practical Showdown: Performance Testing of Rust vs C++ High-Frequency Trading Engines Reveals Surprising Results

Introduction

In the field of high-frequency trading, every microsecond is worth a fortune. When your trading engine starts to exhibit delays, and orders cannot be completed within milliseconds, market price spreads move quickly, and market makers withdraw quotes, this is not just a technical issue but a direct economic loss.

Recently, a trading team encountered such a dilemma: their meticulously crafted C++ trading engine began to frequently experience tail latency (p99 latency) spikes. After a serious incident, a senior trader sent a message saying, “Your engine just cost us a trade that could pay your annual salary.”

The team made a bold decision: to rewrite the core engine in Rust and conduct comparative tests against the C++ version on the same hardware. The test results shocked all the C++ veterans— the Rust version outperformed in latency, and the advantage was significant.

This is not just a simple performance comparison story between languages, but about how language design can enforce better system architecture. Let’s delve into the entire process of this practical showdown.

The Origin of the Problem: Latency Crisis

This trading engine is not a toy-level system. It needs to handle market data from multiple exchanges simultaneously, standardize data, run a matching engine, perform risk checks, and then send orders back to the exchanges. The entire process’s p99 latency must be kept under 1 millisecond, with no jitter even under sudden traffic spikes.

Most of the engine’s code is written in C++, meticulously optimized:

  • Custom memory allocator
  • Handwritten lock-free queues
  • Warmed-up caches

It seems like a codebase that every systems engineer would be proud of.

However, when the market entered new trading modes, problems arose:

  • More trading varieties
  • Thicker order books
  • More severe traffic bursts

The median latency performed normally, and the monitoring dashboard showed everything was fine. But p99 and p999 latencies began to spiral out of control. A few unfortunate orders would be delayed by hundreds of microseconds.

In high-frequency trading, these few unfortunate orders are the only ones that will be remembered.

Why Consider Rust

The team was not a bunch of Rust enthusiasts. Most members grew up in C and C++ environments. The build systems, deployment processes, and monitoring tools were all designed for C++. They had rich experience debugging segmentation faults via SSH at three in the morning.

Rust initially felt more academic—suitable for technical sharing, possibly for developing auxiliary services, but not for hot paths.

Two things changed this perception:

First, the team tracked several severe tail latency spikes and found that the root cause was small memory issues and subtle sharing patterns in the C++ code. These issues did not lead to crashes but were enough to cause allocator contention, cache thrashing, and lock convoying under high load.

Second, they conducted a weekend experiment: they built a small prototype in Rust for part of the pipeline, connected to the same market data source. This prototype showed a smoother latency histogram even before tuning.

The deeper they delved into the flame graphs, the more an idea lingered:

Perhaps the language that protects you from data races and dangling pointers will also drive you to build designs that perform better under load.

Architecture Comparison Testing

The team decided to replicate the core path in Rust, allowing both systems to receive the same data and conduct comparisons under real load generators.

The simplified architecture is as follows:

+-----------------+
|   Market Data Source    |
+--------+--------+
         |
         v
+--------+--------+
|   Data Normalization    |
+--------+--------+
         |
+--------+--------+
|                 |
v                 v
+------+-----+   +-----+------+
| C++ Engine |   | Rust Engine |
+------+-----+   +-----+------+
|                 |
+--------+--------+
         v
+------+------+
|   Trading Gateway   |
+-------------+

Both engines ran on the same hardware, bound to similar CPU cores, with NUMA and interrupts aligned. The same network cards, the same kernel tuning, and the same affinity rules. They replayed recorded market trading data and ran synthetic burst tests that exceeded the worst peaks.

Shocking Benchmark Results

Here is a snapshot of a simplified scenario that convinced everyone:

Scenario C++ p99 (µs) Rust p99 (µs) Reduction
Normal Load (50,000 messages per second) 280 260 Slight Advantage
Stress Test (120,000 messages per second) 910 540 40% Reduction
Extreme Burst (200,000 messages in 2 seconds) 1600 870 45% Reduction

The median latencies were quite close. Under low to medium loads, one could say C++ and Rust were basically on par.

Buttail latency was another matter. Under stress, the C++ engine exhibited long, ugly peaks. While Rust was not completely flat, its tails were much shorter and recovered faster.

CPU utilization told a similar story. The Rust engine ran at a slightly higher average temperature, but instances of individual cores spiking to 100% were much less frequent. These spikes were where the C++ engine stalled due to allocator and lock contention.

Rust Code Example: Showcasing the Difference

Below is a highly simplified Rust-style hot path code. This is not production code, but it reflects the winning design:

use crossbeam_channel::Receiver;

// Order structure
struct Order {
    symbol: u32,  // Trading variety ID
    qty: u64,     // Quantity
    px: u64,      // Price
}

// Engine structure
struct Engine {
    book: OrderBook,  // Order book
}

impl Engine {
    // Run the engine, receiving orders from the channel
    fn run(&mut self, rx: Receiver<Order>) {
        for o in rx.iter() {
            self.match_and_update(o);
        }
    }

    // Match and update the order book
    fn match_and_update(&mut self, o: Order) {
        self.book.apply(o);
    }
}

// Order book structure
struct OrderBook {
    // Compact price levels, no shared ownership
}

impl OrderBook {
    // Apply order, in-place update, no heap memory fluctuations on hot path
    fn apply(&mut self, o: Order) {
        // In-place update, no heap memory allocation on hot path
    }
}

Key Design Features:

  • No shared pointers
  • No mutexes
  • One thread owns the order book
  • Communication via bounded channels

When higher throughput is needed, they shard by trading variety, running multiple engines, each with its own order book and channel.

The borrow checker enforces clear ownership. The type system makes accidental cross-thread sharing of mutable structures much harder.

In C++, the team knew these patterns and had always advocated for them. But the language would readily allow them to break the rules at 2 AM by inserting a shared pointer into a path that truly needed exclusive ownership.

Why Rust Wins on Latency

Ultimately, Rust’s victory is not because its generated code magically runs faster than C++. Excellent C++ compilers can generate code that is very close.

Rust wins because of how it constrains and guides design.

The team ultimately achieved:

  1. Less memory allocation on hot paths

  • Because ownership rules pushed them towards stack-based and arena-based data structures
  • Less lock contention

    • Because sending messages between worker threads is more natural than sharing structures
  • Clearer failure modes

    • Because the compiler refused to let them ignore certain error paths that previously led to strange stalls

    One of the most severe tail latency peaks in the C++ engine came from a rarely hit branch during risk checks that allocated a small object and then quickly released it. Under extreme bursts, this path triggered enough allocator pressure to cause ugly pauses.

    Rust’s design made it impossible for that branch to have visible allocations. The team discovered it during code reviews because the function signature made heap activity obvious.

    The Harder Parts Than Expected

    It must be said honestly that Rust was not easy in the first few weeks. The team had to forget habits that had worked for years. Debugging lifetime issues was frustrating. Some exchange-specific libraries had better tooling in C++.

    They had to write glue code to communicate with existing C and C++ components. Training personnel was necessary. They had to accept a slower pace of feature development during the engine’s stabilization.

    There was also a moment when the C++ version briefly outperformed in a synthetic micro-benchmark. That afternoon, I secretly hoped we could cancel the experiment and return to the language that felt like home.

    That benchmark result favored a pattern we had never encountered in production. Once we stopped chasing synthetic scores and focused on production tracking, Rust pulled ahead again.

    Should You Immediately Rewrite the Engine in Rust?

    No.

    If you have a mature C++ trading system that already meets your latency and risk requirements, a complete rewrite may be the most dangerous thing you can do.

    But our experience illustrates something different:

    If you are starting a new latency-sensitive engine or building a major new component, you should seriously consider making Rust your default choice.

    Not because marketing says so, but because the language structure aligns very well with the discipline that high-frequency systems claim to value.

    Rust makes it harder to lie to yourself about ownership, sharing, and error handling. This pressure directly reflects in tail latency and incident reports.

    The Real Results of the Experiment

    After this face-to-face showdown test, the team did not burn their C++ codebase. They still run C++ in certain tools, integrations, and low-level drivers.

    But for the core real-time path—where a microsecond can determine whether a trade exists—they chose Rust.

    Traders no longer send angry screenshots. The number of latency-related incidents has dropped sharply. The p99 charts have become boring in the best way.

    The most surprising change is not in the charts but in how the team talks about safety and speed in the same breath without flinching.

    Conclusion

    This practical showdown between Rust and C++ reveals an important truth:Modern language design is not just syntactic sugar; it can fundamentally influence system architecture and performance.

    Key takeaways:

    1. Tail latency is the key metric: Proximity in median latency does not mean system performance is comparable; p99 and p999 latencies are the lifeline of high-frequency trading.

    2. Language constraints are design advantages: Rust’s ownership system and borrow checker are not obstacles but guardrails that help build better architectures.

    3. Rewriting is not a panacea: Mature, stable systems should not be rewritten lightly, but new projects should seriously consider Rust.

    4. Performance comes from design, not magic: Rust’s victory is not due to compiler magic optimizations but because it enforces better concurrency and memory management patterns.

    5. The learning curve is worth the investment: While there will be an adjustment period initially, the long-term benefits far outweigh the short-term costs.

    For developers learning Rust, this case demonstrates the real value of Rust in a production environment. It is not just a “safer C++” but a tool that can guide you to build better system architectures.

    If you are building performance-sensitive systems, Rust is worth serious consideration. If you are already using Rust, this case proves that your choice is correct.

    References

    1. Rust vs C++: We Built A Real-Time Trading Engine—C++ Lost The Latency Battle Badly: https://medium.com/@kp9810113/rust-vs-c-we-built-a-real-time-trading-engine-c-lost-the-latency-battle-badly-874d74d10668

    Book Recommendations

    The second edition of “The Rust Programming Language” is an authoritative learning resource written by the Rust core development team and translated by members of the Chinese Rust community. It is suitable for all software developers who wish to evaluate, get started, improve, and study the Rust language, and is regarded as essential reading for Rust development work.

    This book introduces the fundamental concepts of the Rust language to practical tools in a step-by-step manner, covering advanced concepts such as ownership, traits, lifetimes, and safety guarantees, as well as practical tools like pattern matching, error handling, package management, functional features, and concurrency mechanisms. The book includes three complete project development case studies, guiding readers to develop Rust practical projects from scratch.

    Notably, this book has been updated to include content from the Rust 2021 edition, meeting the systematic learning needs of beginners and serving as a reference guide for experienced developers, making it the best entry point for building solid Rust skills.

    Recommended Reading

    1. Rust: The Performance King Sweeping C/C++/Go?

    2. A C++ Perspective from Rust Developers: Revealing Pros and Cons

    3. Rust vs Zig: The Emerging Systems Programming Language Battle

    4. Essential Design Patterns for Asynchronous Programming in Rust: Enhance Your Code Performance and Maintainability

    Leave a Comment