Zero-Copy Bridges Between Python and Rust: 5 Practical PyO3 Patterns

Zero-Copy Bridges Between Python and Rust: 5 Practical PyO3 Patterns

Introduction

In the practice of mixed programming with Python and Rust, we often encounter a crucial performance bottleneck that is frequently overlooked: the repeated copying of data when passing across language boundaries. Imagine that you have meticulously crafted a high-performance Rust function, wrapped with PyO3 for invocation in Python, only to find that each call incurs three memory allocations and copies, nullifying the performance advantage.

This article will delve into five practical zero-copy techniques, allowing data transfer between Python and Rust to be as elegant as the coordination between a conductor and an orchestra—where the conductor does not need to reprint the score for every measure, and Rust does not need to copy data for each call.

Why Zero-Copy is So Important

For a single transfer of a 5 MB buffer, the copying overhead may be negligible. However, if your streaming service needs to handle each batch, or your API needs to respond to each request, this “death by a thousand cuts” copying can become fatal.

The zero-copy bridge offers the following advantages:

  • Lower Latency: Eliminating memory copy operations on the hot path
  • Reduced Memory Pressure: Decreasing short-lived memory allocations, lowering GC burden
  • Clearer Mental Model: Python owns the objects, while Rust merely borrows them

Pattern 1: Borrowing Python Byte Objects

When you pass bytes or bytearray from Python to Rust, there is no need for copying at all. PyO3 allows you to directly borrow the underlying buffer.

Architecture Diagram

+---------+        +--------+        +------+
| Python  | bytes  | PyO3   | &[u8]  | Rust |
| (bytes) +------->+ layer  +------->+ fn   |
+---------+ no copy+--------+ borrow +------+

Python Code

# bridge_bytes.py
import bridge

# Create a 4MB byte object
payload = b"\x01\x02\x03\x04" * 1_000_000
checksum = bridge.checksum_bytes(payload)
print(checksum)

Rust + PyO3 Implementation

use pyo3::prelude::*;
use pyo3::types::{PyBytes, PyByteArray};

#[pyfunction]
fn checksum_bytes(obj: &PyAny) -> PyResult<u64> {
    // Support both bytes and bytearray types
    if let Ok(bytes) = obj.downcast::<PyBytes>() {
        let slice = bytes.as_bytes(); // Returns &[u8], borrowed, no copy
        Ok(fnv1a(slice))
    } else if let Ok(arr) = obj.downcast::<PyByteArray>() {
        // Safety: we only read, not modify
        let slice = unsafe { arr.as_bytes() }; // Returns &[u8], borrowed
        Ok(fnv1a(slice))
    } else {
        Err(PyErr::new::("expected bytes or bytearray"))
    }
}

// FNV-1a hash algorithm implementation
fn fnv1a(data: &[u8]) -> u64 {
    let mut hash = 0xcbf29ce484222325u64;
    for byte in data {
        hash ^= *byte as u64;
        hash = hash.wrapping_mul(0x100000001b3);
    }
    hash
}

#[pymodule]
fn bridge(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(checksum_bytes, m)?)?;
    Ok(())
}

Key Point: as_bytes() returns a borrowed reference to all data from Python as &[u8], with no additional buffer and no memory copy.

Notes:

  • If converted to Vec<u8> in Rust, it will reintroduce copying.
  • If you need to modify bytearray, special attention must be paid to lifetime and aliasing issues.

Pattern 2: Zero-Copy Access to NumPy Arrays

In numerical computing scenarios, NumPy is the universal currency. Using the rust-numpy package, Rust can operate directly on NumPy arrays in place.

Architecture Diagram

Python (NumPy ndarray)
│
│ buffer protocol
▼
PyO3 + rust-numpy
│
│ &[f64] or &mut [f64]
▼
Rust Loop

Python Code

# scale.py
import numpy as np
import bridge

# Create an array of 1 million random numbers
x = np.random.rand(1_000_000).astype("float64")
bridge.scale_inplace(x, 1.5)  # Modify x in place
print(x[:5])

Rust Implementation

use pyo3::prelude::*;
use numpy::PyArray1;

#[pyfunction]
fn scale_inplace(arr: &PyArray1<f64>, factor: f64) -> PyResult<()> {
    // Safety requirement: the array must be contiguous
    let slice = unsafe { arr.as_slice_mut()? };
    
    // Directly operate on NumPy's buffer
    for v in slice {
        *v *= factor;
    }
    Ok(())
}

#[pymodule]
fn bridge(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(scale_inplace, m)?)?;
    Ok(())
}

How It Works:

  1. Python allocates the ndarray buffer
  2. Rust borrows that memory through PyArray1
  3. Rust loop directly writes to NumPy’s buffer

Applicable Scenarios:

  • In-place transformations (normalization, scaling, logarithmic transformations)
  • Hot paths for machine learning preprocessing or feature engineering
  • Large-scale vector operations where memory allocation is a real cost

Notes:

  • as_slice_mut requires the array to be contiguous. If a user passes an array with a strange stride, it will return an error. It is recommended to preprocess with np.ascontiguousarray on the Python side.

Pattern 3: General Buffer Protocol

What if the data is neither bytes nor NumPy, but rather another object that supports the buffer protocol?

PyO3’s PyBuffer can interact with any Python object that exposes memory through the standard buffer API.

Python Code

import array
import bridge

# Create a float array using the array module
buf = array.array("f", [1.0, 2.0, 3.0])
bridge.scale_buffer(buf, 2.0)
print(buf)  # array('f', [2.0, 4.0, 6.0])

Rust Implementation

use pyo3::prelude::*;
use pyo3::buffer::PyBuffer;

#[pyfunction]
fn scale_buffer<'py>(py: Python<'py>, obj: &'py PyAny, factor: f32) -> PyResult<()> {
    let mut buf: PyBuffer<f32> = PyBuffer::get(py, obj)?;
    
    // Safety: we rely on Python to maintain the buffer's immutability
    let slice = unsafe { buf.as_slice_mut()? };
    
    for v in slice {
        *v *= factor;
    }
    Ok(())
}

#[pymodule]
fn bridge(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(scale_buffer, m)?)?;
    Ok(())
}

Advantages:

  • Applicable to array.array, memoryview, various image libraries, and custom types
  • Rust does not need to know the specific Python type

Notes:

  • If the algorithm relies on contiguous memory, you need to check if the buffer is C-contiguous.
  • You must ensure that the element size matches (choosing f32 and passing float64 from Python will lead to undefined behavior).

Pattern 4: Exposing Data Owned by Rust to Python

The previous patterns all involve Python owning the memory while Rust borrows it. Sometimes we need to reverse this: let Rust own large chunks of data (such as indices or model weights) and expose a lightweight handle to Python.

Architecture Diagram

Python
(handle)
│
│ pointer / Arc
▼
+------------------------+
| Rust struct (big data) |
+------------------------+

Rust Implementation: Embedding Table

use pyo3::prelude::*;

#[pyclass]
pub struct EmbeddingTable {
    embeddings: Vec<[f32; 768]>,  // Store 768-dimensional embedding vectors
}

#[pymethods]
impl EmbeddingTable {
    #[new]
    fn new() -> Self {
        // In practice, load from disk or build once
        EmbeddingTable { embeddings: Vec::new() }
    }
    
    // Add an embedding vector
    fn add(&mut self, embedding: Vec<f32>) -> PyResult<()> {
        if embedding.len() != 768 {
            return Err(PyErr::new::("expected 768-dim embedding"));
        }
        let mut arr = [0.0f32; 768];
        arr.copy_from_slice(&embedding);
        self.embeddings.push(arr);
        Ok(())
    }
    
    // Calculate dot product
    fn dot(&self, idx: usize, query: Vec<f32>) -> PyResult<f32> {
        if query.len() != 768 {
            return Err(PyErr::new::("expected 768-dim query"));
        }
        let emb = self
            .embeddings
            .get(idx)
            .ok_or_else(|| PyErr::new::("index out of range"))?;
        Ok(emb
            .iter()
            .zip(query.iter())
            .map(|(a, b)| a * b)
            .sum())
    }
}

#[pymodule]
fn bridge(m: &PyModule) -> PyResult<()> {
    m.add_class::()?;
    Ok(())
}

Python Usage

from bridge import EmbeddingTable

table = EmbeddingTable()
table.add([0.1] * 768)
score = table.dot(0, [0.2] * 768)
print(score)

Where is the Zero-Copy?

  • Vec<[f32; 768]> exists entirely in Rust
  • Python calls Rust methods without copying the entire internal table
  • Copies only occur at the boundary when adding or querying for the first time, not on every reuse

Applicable Scenarios:

  • Owning large structures (indices, routing tables, feature stores), loaded once but queried frequently
  • Providing a Python-friendly facade for performance-critical Rust core

Pattern 5: Streaming Bridges Instead of Bulk Copies

Sometimes the best zero-copy is to not share large chunks of data at all.

If you are dealing with large logs, network frames, or bulk events, you can build a streaming bridge:

  1. Rust pulls data from the source (sockets, files, Kafka, etc.)
  2. Rust exposes an iterator/generator interface to Python
  3. Each item is small, and large buffers never enter Python space

Rust Implementation: Line Stream

use pyo3::prelude::*;
use pyo3::types::PyBytes;
use std::fs::File;
use std::io::{BufRead, BufReader};

#[pyfunction]
fn lines(py: Python<'_,>, path: &str) -> PyResult<PyObject> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);
    
    // Move the reader into a Python generator
    pyo3::types::PyIterator::from_iter(
        py,
        reader.lines().filter_map(|line| line.ok()).map(|line| {
            // This does allocate memory for each line, but never loads the entire file into Python memory
            Python::with_gil(|py| PyBytes::new(py, line.as_bytes()).into_py(py))
        }),
    )
    .map(|it| it.into_py(py))
}

#[pymodule]
fn bridge(m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(lines, m)?)?;
    Ok(())
}

Python Usage

import bridge

for line in bridge.lines("huge.log"):
    # Incremental processing, the file is never fully loaded into Python memory
    process(line)

This is not strictly “zero-copy” in the buffer sense, but it is a zero-bulk design: large IO-intensive workloads remain in Rust, and Python only sees small chunks of data.

In real systems, you can combine this pattern with Pattern 1 or Pattern 2 to ensure that the transfer of individual blocks also incurs no additional copying.

Choosing the Right Bridge: Decision Tree

  • If you have bytes / bytearray / network frames → Use Pattern 1 (PyBytes/PyByteArray)
  • If you have numerical arrays / machine learning features → Use Pattern 2 (NumPy + rust-numpy)
  • If you have “any object with a buffer” → Use Pattern 3 (PyBuffer)
  • If you have large persistent state (indices, models, tables) → Use Pattern 4 (Rust-owned #[pyclass])
  • If you are IO-intensive, reading large streams → Use Pattern 5 (streaming iterators/generators)

Designing the Python ↔ Rust interface is akin to designing the API boundaries between microservices: clearly define who owns what memory and when.

Conclusion

Python does not need to be fast; it just needs to be in the right place.

By utilizing these five PyO3 bridge patterns:

  1. Borrowing Python byte buffers
  2. Operating in place on NumPy arrays
  3. Leveraging the general buffer protocol
  4. Exposing Rust-owned state as Python objects
  5. Streaming data instead of bulk copying

You can transform Python into a high-level orchestrator, allowing Rust to do the heavy lifting without getting bogged down by copying overhead.

If you are building data pipelines, machine learning systems, or high-throughput services, and you are currently just “passing lists” to Rust, you may be missing out on significant performance optimization opportunities.

Adjust one bridge in your codebase this week, measure the difference, and then tell me how much latency you cut.

References

  1. 5 Zero-Copy Bridges Between Python and Rust with PyO3: https://medium.com/@sparknp1/5-zero-copy-bridges-between-python-and-rust-with-pyo3-bc64961e4fca

Book Recommendations

This book, “The Rust Programming Language” (2nd Edition), is an authoritative learning resource written by the Rust core development team and translated by members of the Chinese Rust community. It is suitable for all software developers looking to evaluate, get started, improve, and research the Rust language, and is considered essential reading for Rust development work.

The book introduces the fundamental concepts of the Rust language to unique practical tools, covering advanced concepts such as ownership, traits, lifetimes, and safety guarantees, as well as practical tools like pattern matching, error handling, package management, functional features, and concurrency mechanisms. It includes three complete project development case studies, guiding readers from zero to developing practical Rust projects.

Notably, this book has been updated to the Rust 2021 version, meeting the systematic learning needs of beginners and serving as a reference guide for experienced developers, making it the best entry point for building solid Rust skills.

Recommended Reading

  1. Rust: The Performance King Sweeping C/C++/Go?

  2. From the Perspective of Rust Developers: Revealing the Pros and Cons of C++

  3. Rust vs Zig: The Battle of Emerging System Programming Languages

  4. Essential Design Patterns for Rust Asynchronous Programming: Enhance Your Code Performance and Maintainability

Leave a Comment