Optimizing rav1d: An AV1 Decoder Written in Rust

Optimizing rav1d: An AV1 Decoder Written in Rust

When it comes to video decoding, AV1 is one of the most advanced codec standards available today. It offers high compression efficiency and excellent image quality, but its implementation is also quite complex. rav1d is an open-source project that implements AV1 decoding functionality in Rust. While the code structure is clear and secure, performance has always been a challenge.

The goal this time is clear: to improve the decoding speed of rav1d as much as possible without sacrificing code maintainability.

Where Do Performance Bottlenecks Come From?

Initially, we did not rush to modify the code but instead conducted a round of performance analysis. By using tools to trace function calls and hot paths, we identified several key issues:

  • Frequent memory allocation and deallocation, especially during frame buffer management and tile decoding phases;

  • Redundant calculations in certain hot functions, such as the inverse quantization process for transform coefficients;

  • Low degree of parallelization, with the potential of multi-core CPUs not being fully utilized.

These issues are like a mountain of dirty dishes in the kitchen; they seem chaotic, but once we clarify the order and tackle them one by one, the entire process can run smoothly.

Memory Management Optimization: Reducing the Number of “Brick Moving” Operations

While Rust’s memory safety mechanisms are reliable, improper use can lead to additional overhead. We noticed that during the decoding of each frame, temporary buffers were repeatedly created and destroyed. This is akin to buying groceries and washing pots anew every time you cook, which is naturally inefficient.

Thus, we introduced anobject pool mechanism to reuse frame buffers. This way, memory is like pre-prepared ingredients, ready for use, saving a lot of initialization time.

Hot Function Optimization: Don’t Calculate the Same Number Twice

During the inverse quantization and inverse transformation processes, some constant parameters were repeatedly calculated, such as scaling factors and transformation matrices. These calculations could have been completed during initialization but were executed in the decoding loop each time.

We moved theseredundant calculations outside the loop and cached intermediate results where appropriate. The effect was immediate—what used to take 15% of the time for this module now takes less than 5%.

Parallelization Attempts: Distributing Tasks Among Multiple “Workers”

AV1 supports tile-based decoding, which theoretically allows for parallel processing. However, in rav1d, this logic was still executed serially. We attempted to utilize Rust’s rayon library to assign each tile to run on different threads.

However, parallelization is not a panacea. Scheduling overhead, lock contention, and data dependencies all need careful consideration. Ultimately, we achieved significant speedup in scenarios supporting multiple tiles while retaining a single-threaded mode to accommodate resource-constrained environments.

Tools to Assist: Letting Data Speak

Throughout the optimization process, we relied on tools like perf and flamegraph to continuously validate our ideas. They act like microscopes, helping us see the true state of code execution.

Without these tools, relying solely on guesswork and experience can easily lead us astray. For instance, a function that seems “slow” might just be a frequently called minor player; meanwhile, the real bottlenecks are often those inconspicuous auxiliary operations.

Looking Ahead: How Much More Potential Can Be Explored?

The current optimizations have already brought significant performance improvements, but this is far from the end. Next, we plan to:

  • Explore SIMD instruction acceleration for core operations;

  • Introduce a finer-grained task division mechanism;

  • Evaluate performance across different platforms for targeted adaptations.

After all, optimizing a decoder’s performance is never a one-time deal. It is more like a continuous refinement process, where every bit of progress comes from a persistent pursuit of detail.

Click 👇 to follow

Like + Share + View to Quickly Improve Programming Skills👇

Reference link: https://www.memorysafety.org/blog/rav1d-performance-optimization/

Leave a Comment