Previously:
- Rust is 2x faster than Rust: PyO3 Edition
- Rust, Python, and You: Rust-Python Interop for Python devs
- Rust-Python Interop for Scientific Codebases
All of which are basically the same blog post for slightly different audiences.
In light of a conversation I had last week, I figured it was worthwhile to write up the main pitch for Rust in scientific programming, specifically for Python-Rust interoperation.
This rendition is aimed at domain scientists accustomed to using Python and familiar with the usual tools for optimizing scientific Python, especially in HPC settings.
Broadly, the scientific Python performance story goes as follows:
- An idea: a domain scientist writes it down in the simplest and most expressive language at hand - often Python
- The idea is expanded, iterated on, tested, maybe written up in a paper. A small-medium team get involved by this point
- It's not quite as fast as anyone would like. Throw the standard playbook at it: NumPy if the codebase wasn't already saturated with it, GPU acceleration with pytorch, maybe some Numba or JAX
- If that doesn't work, it's often necessary to drop into another, more performant language. For deep-domain work, Julia may be the right choice via
juliacall, especially for established libraries, but oftentimes it's a custom C or C++ extension. Depending on your field, FORTRAN may be on the table - If addressing the hot loops via the extension isn't enough, it's necessary to drop out of Python entirely for a full rewrite: Julia, C, C++, FORTRAN depending on exact needs
I'm here to convince you that Rust should be your default choice for steps 4 and 5 - or if you really like it, as early as step 3.
Here we'll cover: - What is Rust? - Why Rust over other alternatives? - How do I use Rust to enhance Python performance? - What are some examples of Rust in production?
What is Rust?
Rust is a relatively young systems programming language: v1.0 of the language was released in 2015, and it is currently on v1.95. A new version of the language releases once every six weeks with upgrades, performance improvements, and more stable features, and a new major 'edition' with potentially breaking changes releases once every three years.
The advantages
- Safety: Rust is a statically typed, type-safe language (like C++, unlike C, FORTRAN, Julia, or Python), and is also memory-safe by default (like Julia and Python, unlike C, C++, or FORTRAN). Its type and memory safety models also combine to make it thread-safe by default, with built-in concurrency primitives that make it very clear what information can be shared across threads safely and can often catch data races at compile time.
- Performance: Rust compiles directly to statically-linked native binaries. It uses most of the same LLVM compiler backend as C++, and for equivalent code (recognizing that 'equivalent' code across languages is very difficult to measure) is broadly as performant as C or C++, and can sometimes safely make optimizations the former languages cannot. It does not require an interpreter or virtual machine, meaning it can run quickly and with little memory overhead.
- Ergonomics: Being a modern language, it was built with a lot of developer convenience in mind and comes bundled with a lot of high-quality tooling. Installing and updating the language itself is trivial with
rustup.cargowraps building, testing, documentation, and dependency management all into a single tool that works on all platforms. Rust's compiler errors are widely regarded as the best in the industry: you can actually just read them!
The costs:
- Rust is nearly 11 years old, and while the dependency ecosystem has filled out quite well by this point, the coverage in machine learning and domain science isn't comparable to Python and Julia - though that is slowly changing.
- Fewer people know it, though this is also changing quite quickly. It's especially popular among young programmers, but finding senior-level Rust developers is harder than for C/C++.
- It takes much longer to compile than C, and it's generally even slower than heavily-templated C++. This is the #1 complaint from Rust developers. It's not that bad if you're writing a small extension, but large Rust codebases take an age and a half to compile - you won't get super-fast feedback loops like you would with C or Python.
- Rust's macros and metaprogramming (comparable to C++'s templates) are significantly different from the regular language and take a lot of getting used to. Compare Zig, an even younger systems programming language which has famously excellent metaprogramming.
- Rust is a complex language. The degree of complexity is sometimes exaggerated, but becoming a proficient systems-level Rust programmer requires the same kind of fundamentals knowledge as C, familiarity with the language's standard library and broader ecosystem, and Rust-specific concepts like lifetimes and aliasing-xor-mutation.
I believe that, for most purposes, the advantages dramatically outweigh the costs.
It compiles slowly, but errors which would be caught at runtime in a less strict language, including memory and concurrency errors are caught in, or even before, compile time.
It's more complex to learn, but the teaching resources for it are excellent, and that time is spent actually understanding the language and computer fundamentals - as opposed to, say, C++, where you spend a lot of time learning a snarled tooling ecosystem that emerges from 30 years of technical debt rather than the requirements of your program.
The community and ecosystem are unified: there's one compiler and everyone uses either the newest version or the version from a couple months ago, and it's trivial to update. There's one central package repository, one standard library, one central build/test/doc/dependency tool with optional community extensions. You don't have to plan a meeting to figure out which version of the language you're going to use. It runs on pretty much everything, including embedded devices, and getting someone else's project working on your computer is trivial.
It interoperates very well with other languages - we'll focus mainly on Python, but R, C, and Julia are also quite easy to use alongside Rust, so if you need to reach out to other libraries, you can do it quite easily.
Also, I think the syntax is lovely - this is very subjective and many people disagree, but I think it's explicit in just the right way. It's very easy to tell what a given line is doing, a given element's type is obvious, and data flow is easy to read.
Rust-Python Interoperation
A complete rewrite of your existing, documented, tested, familiar codebase is a massive pain in the ass. If you've gotten to the point that you need to use a lower-level language for your hot loops, everyone would rather make that extension as easy to use as possible. Rust makes it very easy indeed.
I've written on this subject... a lot, but here's the summary of how to use Rust and Python together:
The simplest way is to use maturin and pyo3. maturin is a build system for, among other things, turning Rust binaries into Python wheels. pyo3 is a Rust library for making Rust-Python bindings (in both directions).
You can write a drop-in replacement for a Python function that calls Rust code:
// we annotate the Rust function with #[pyfunction] so that it is recognized as a function designed to be called from Python #[pyfunction] // the function can take as an argument any kind of Python data that we can convert to Rust data fn foo( // booleans exist in both Python and Rust and can be copied cheaply a: bool, // Rust makes you specify the type and size of numbers, but they can be copied very easily b: f64, // if you're willing to copy data, you can just do this - you can also // get more involved to use the string as an argument with less overhead c: String, // You can define Python classes in Rust and provide a way to manually // convert between them to use them as arguments or return types d: PyClass, // commonly-used types like numpy arrays have built-in equivalents with // the numpy crate in Rust e: PyReadOnlyArray1<T>, // We can also tell Rust that we could potentially accept any Python type // as an argument and handle the conversion manually in the function f: &Bound<'_, PyAny>, // We can return from Rust to Python any type that implements IntoPy, // including custom classes defined entirely in Rust! ) -> ReturnType { /* do stuff */ }
And we can define Python classes in Rust, along with associated methods of all kinds:
#[pyclass] struct FooClass { a: bool, b: f64, c: String, d: BarClass, // we can even store Python data directly using the Py<T> smart pointer e: Py<PyReadOnlyArray1<f64>>, f: Py<PyAny> } #[pymethods] impl FooClass { #[new] fn new( a: bool, b: f64, c: String, d: BarClass, e: Py<PyReadOnlyArray1<f64>>, f: Py<PyAny> ) -> Self { /* build Self */} }
These functions and classes can then be exposed in a pymodule:
#[pymodule] fn my_rust_module(m: &Bound<'_, PyModule>) -> PyResult<()> { m.add_function(wrap_pyfunction!(foo, m)?)?; m.add_class::<FooClass>()?; Ok(()) }
Which can then be developed into a Python wheel using maturin:
# builds the Rust module into a native Python wheel
maturin build
# builds and installs into the local python environment
maturin develop
# use the -r flag to compile in release mode, trading longer
# compilation times to optimize it more heavily
# equivalent to -O3 in C/C++
And then used from Python seamlessly just like any other package
import my_rust_module as mr from my_rust_module import FooClass output = mr.foo(a, b, c, d, e, f) fooclass = FooClass(a, b, c, d, e, f)
Making Python-Rust code efficient and ergonomic, as well as providing things like Python-side type hints and LSP support, are a more involved matter, but not difficult.
For example, as a matter of efficiency, you want to avoid crossing the Rust-Python boundary as much as possible, and prefer reading Python data immutably rather than copying and mutating it across the boundary. Storing Python data in a Rust-side struct is possible, but introduces complications that might not be necessary. Especially when building objects in Rust to be used from Python, I like to think of the Python API as a control center sending instructions to the Rust code, with as little data as possible actually being copied and preferring to send pointers to read-only views of Python data.
Why Rust Instead of _?
C
C is certainly the most traditional choice for building Python extensions; almost everyone uses CPython, after all.
But this necessarily brings in overhead: you're not writing regular C, you're writing C for Python, dealing with both ordinary C memory management and also managing Python memory from across the FFI boundary. I hope you like manually unpacking and type-checking stringly-typed arguments, because you'll be doing it a lot! I've seen functions with more lines of check() than lines of business logic.
Henceforth until the day you die, all day, every day, you will be writin'
if (np_arr) {Py_DECREF(np_arr);}, turnin' big refcounts into little refcounts.
And you need all that checking, because one segfault in a C extension doesn't just cause the operation to fail; it brings down the interpreter and everything running on top of it. Goodbye to that Jupyter notebook!
The Rust promise is: you're never doing that again. The responsibility of watching every sparrow in its flight does not rest on your shoulders alone, but is shared by the compiler, which can prove at compile time that safe Rust code is free of memory bugs. No buffer overflows, no dangling references, no double-free, null dereferences, or uninitialized reads. These are all rejected at compile time.
This also applies across the FFI boundary via pyo3. The mechanical work is automated, and Rust's ownership model guarantees that data from the Python side is correctly INCREF'd and DECREF'd as needed, including in a multi-threaded context with shared data (thanks to Rust's Send/Sync trait bounds).
And in the rare event Rust hits an unrecoverable error and panics, pyo3 catches it at the FFI boundary and raises a Python exception rather than crashing the interpreter. Your user's Jupyter session survives either way.
The overhead of managing build systems and tooling is also much lower. Rust's cargo handles pretty much everything in the course of regular development, while turning your Rust binary into a statically-linked Python wheel is handled by the maturin build system. I've been making Rust-Python packages for over a year now, and for the life of me I've only ever needed three commands: maturin new to start a new project, maturin build to turn a Rust binary into a Python wheel, and maturin develop to build and also install the wheel directly into the local environment. No more wrestling with setuptools, no more meson build files scattered all over the shop. The tooling and building gets entirely out of your way.
You also get to rely on Rust's standard library and package ecosystem, which includes extremely reliable and highly-optimized versions of common data structures like hash maps and sets, vectors, utf-8 encoded strings that automatically handle multi-byte characters, and functional iterator methods. You don't need to define your own custom array type.
You also get a nice, modern, ergonomic static type system, and error handling is easy when errors are un-ignorable values. Fallible operations mark themselves as such, and will force you to handle them before you can use their values.
In many cases, especially for scientific work, C and Rust will produce identical or near-identical assembly with the same performance characteristics. But in many cases, Rust can actually be faster than a straightforward C implementation, because it can give LLVM more detailed information about lifetimes, aliasing, and mutation that allow it to safely make optimizations that may or may not be valid in C.
Rust also has safe, built-in concurrency primitives that make parallelism very easy to implement without crashes or subtle bugs. Sure, you can technically do all the same things manually, but when you're writing code on a deadline, you're a lot more likely to actually ship a complex parallel system if you're doing it with dedicated, safe tools rather than a pthread and a dream.
Now, most of these are also reasons to switch from C to C++, so let's get into the reasons to prefer Rust on top of C++ as well:
C++
If you're using C++ rather than C, it's probably because you want to take advantage of C++'s strong typing, standard library, and zero-cost abstractions. Rust is competitive in all these areas and has a strong advantage in several. Indeed, Rust was initially developed, in no small part, as a replacement for C++.
Rust's type system is stronger, more modern, and more expressive. Specifically, it's an ML-family type system, with local type inference, which brings features like Algebraic Data Types (ADTs), pattern matching, and parametric polymorphism. That's a lot of jargon, but it adds up to: Rust's type system is expressive, safe, and (for a core subset of the language) mechanically verified to be sound. The key phrase you'll hear repeated is that Rust aims to "Make illegal states unrepresentable". In particular, Rust's sum types (aka enums) are exhaustively checked, and Rust has no null value.
As a result, entire classes of C++ runtime errors like null dereference and use-after-free bugs become compile-time type errors in Rust. An entire category of critical, difficult bugs are banished before the code even runs at no cost to performance.
As an anecdote, last year a professor spent half an hour trying to show the class a computer vision model he coded up... which crashed immediately because the new version of tensorflow failed to unlock a mutex. Wouldn't have happened with Rust!
Rust's standard library has a different purpose than C++'s. Where C++ tries to be extremely comprehensive, Rust's is much smaller. Both have collections, iterators, I/O, atomics, etc. But things like RNG, serialization, networking, or advanced timing aren't in it. Instead, these features exist in the third-party package ecosystem (though many of the most commonly used packages are also maintained by members of the Rust core team). The result is that you'll often have to import at least a few external packages for most applications... but as we've already established, the Rust package ecosystem and dependency management is one of the most widely-praised parts of the language via the cargo tool, making this process trivial.
Consequently, Rust's standard library is small and focused, its naming conventions are consistent and sane, and there aren't hidden performance gotchas. Widely-used libraries are upstreamed into it only when they can pass a high bar for consistency, stability, performance, and utility... while still being easily available as third-party packages.
Also, Rust's standard library is split into distinct top-level namespaces like std::io, std::collections, std::cmp, etc, which makes finding desired functionality very easy. The Rust Project has an API guide which is not only followed consistently by the entire standard library, but is also idiomatic for the third party ecosystem: you can often just intuit the right method names for types you've never seen before in third-party modules.
Much the same applies to zero-cost abstractions, a feature which Rust shares with C++, but Rust goes a step further in a few places: Rust's iterators are patterned off of Haskell's, with lazy methods that compile down to tight loops without intermediary allocation. In addition, Rust's lifetime and mutation tracking are entirely compile-time constructs that disappear during codegen, allowing it to provide memory safety without mandatory garbage collection or reference counting.
In addition, Rust just plain brings in a lot of quality-of-life features:
- Everything is immutable by default instead of needing to spam
consteverywhere. - Declarative macros are hygienic.
- Import/definition order doesn't matter.
- Idiomatic formatting is enforced by the default linter.
- Keywords don't get re-used nonsensically.
- Numeric types have consistent sizes and sensible names like
i32, (always 32 bits)f64, (always 64 bits) andusize(always the platform's pointer size) rather thanint,long int,long longorunsigned long long int. - CMake or setuptools? Conan or vcpkg? GoogleTest or Catch2? Meson or Bazel? Nope. Just
cargo. It'scargoevery time. - And did I mention that you'll never have to think about header files again?
Note that Rust is not an OO language. It does have traits, which are like a more powerful version of Java interfaces or Haskell type classes, but it intentionally does not include inheritance, preferring composition.
Cython
Writing Cython is a bit like writing C extensions for Python, but coming from the other side. It's specifically a tool for writing Python extensions that compile to C, using Python syntax.
It's a shifting hybrid of Python and C, with some very good conveniences for dealing with things like NumPy objects through its typed memory views.
Unfortunately, that also means you're perpetually somewhere in between Python and C. To get high performance, you need to drop to C's level and start managing memory manually, with all the same issues as a regular C extension.
You still don't get the advantage of a strong, modern type system, never mind things like ergonomic error handling, checking mutation, etc.
And since you're basically transpiling Python to C, you can, at best, get close to native C performance, but are generally bounded by it (), and even that would require you to lean heavily into Cython's cdef. I'd still rather do that than manually turn big refcounts into little refcounts, but better alternatives exist.
It can be a good choice if you want Python's semantics, with the option to quickly and cheaply drop down into lower-level semantics as needed. But if your Python project is already mature, rewriting it to Cython is a significant undertaking and a dedicated Cython extension isn't that much more convenient than one in Rust.
Julia
I'm including Julia for completeness, not because I think it's generally a good choice for the kind of targeted extensions we're discussing. It's certainly possible to call Julia from Python, and in cases where there's an existing Julia library you want to use, it can be very convenient to just reach across the boundary rather than rewriting to Python. But that's where the advantages end.
Unlike the other members of this list, Julia doesn't compile to a native binary, instead having its own runtime that needs to run alongside the Python interpreter. Startup times for Julia extensions have improved a great deal in recent years, but there's still a little hiccup when starting up the Julia runtime, and you'll want to make sure to re-use that as much as possible (I've seen code that was accidentally spending a quarter of its runtime re-compiling the Julia JIT instead of re-using the one it already compiled).
I hope you'll allow me to get a bit more personal and opinionated: in college, I had several professors who waxed rhapsodically about Julia and mourned that it never quite took off and replaced Python for data science work. After experiencing the language myself, I have to ask, what were they thinking?
My experience trying to read a Julia codebase is an exercise in frustration. What type is this variable? No telling. What does this function do? You'll have to read all half a dozen functions with the exact same name that get called depending on the input types. Which you can't see ahead of time. Is that a function call, a struct initialization, or a secret third thing? Don't ask me.
It doesn't help that I've never once been able to get julials to start up without immediately crashing. The tooling ecosystem is simply not up to snuff.
Python's dynamic typing is frustrating enough, but any given Python variable is generally at least intended to be a given type or class member, even if you have to develop psychic abilities to figure out which one. Not so for Julia.
In sum, I do not recommend Julia for scientific Python extensions. If you're starting a greenfield project and you want a performant, dynamically typed language with good library support for scientific and mathematical processing, you can do worse. But even so, I'd urge you towards something else.
Oh, and if you just want to use greek letters in your codebase, you can do that in Rust too. Just turn off the linter error.
FORTRAN
If you're a FORTRAN developer and you think FORTRAN-Python is a better solution for your use case, I have neither the expertise nor inclination to tell you otherwise. Go in peace.
How do I use Rust to enhance Python Performance?
First, profile your Python code. For precise and detailed profiling, samply is great, but for a first set of passes I prefer cProfile with snakeviz.
Once you have a good idea of where the bottleneck in your Python code lies, examine it closely. In particular, pay attention to the data flow: what gets copied, what gets mutated, what gets created and discarded. You can often find opportunities to speed things up without writing an external extension.
If you've optimized it as much in Python as you can and it's still not enough, then we can drop down into Rust.
We'll generally want to convert functions on a one-to-one basis, at least to begin with. Since Rust is statically typed and requires that both inputs and outputs be explicitly annotated, I like to start by writing the signature first.
This is a good place to think in terms of constraints: what data are we operating on, what dependencies exist between the data, what needs to be mutated, and what needs to be copied. Oftentimes, those constraints weren't obvious in Python, or were being covered by ubiquitous copies and reference counting.
After we've written a one-to-one conversion of the Python function in Rust, we can expect at least a small speedup, but we'll also have a better view of where the transformation is doing redundant work. I like to call one of the most common patterns here 'this array mask could have been a kernel'.
If the function performs file I/O, it's often also possible to make use of deliberate zerocopy memory access patterns to reduce allocations.
If you're operating on multiple arrays simultaneously, it's advantageous to zip those arrays into a single iterator operation. These can also be efficiently parallelized more easily than in Python. Rather than handling a ProcessPool manually, you can usually just cargo add rayon and switch out iter for par_iter, and voila, it's all parallelized.
That's generally the shape of things: figure out what your function has to do, and do absolutely no more than that - no more reads and writes than necessary, no more allocations and copies than necessary. Turn many loops into a single loop. Use cheap, safe concurrency where viable.
Some examples
For an example of a function translation that gets a big speedup from turning ~70 NumPy array operations into a single Rust iterator, see the analytical_kick_velocity example from the bottom of my blog post on vectorized array operations (soon to be adapted into a SciPy talk!).
This benefits a great deal from being able to take read-only views of NumPy arrays (something which is even easier in Cython!) and operate over them all simultaneously. In cases where the operations being performed aren't very heavy, we can even benefit from some auto-vectorization and condensation of instructions, something NumPy's eager evaluation can't match even with buffer reuse.
A more involved example comes from the problems McFACTS ran into with AstroPy units. We needed them to ensure that we weren't applying transformations on the wrong units, but even checking them, never mind applying them, has a non-trivial performance cost.
The solution, as I covered in another blog post, was to manually replicate the necessary conversion logic in Rust, ripping the unit strings and matching on them ourselves instead of using AstroPy's machinery. This would be infeasible if we actually needed all of AstroPy's features, we were only using the unit conversions, and only for a small range of possible units. We replace 20 layers of built-in function calls and string manipulation with a couple memory accesses and some inlined conversion logic.
You can also take a look at the McFACTS codebase itself and the mcfast rust extension I've built for it, and see how the helper functions are used: we make drop-in Python replacement functions with the exact same input and output profile, which call Rust helpers under the hood. We can switch between implementations in the main simulation just by replacing foo with foo_optimized and vice versa when making modifications.
Rust in Production
If you've read this far, you may still be interested in seeing other examples of Rust in the wild. Where else is this being used?
In Python extensions specifically:
- James Logan's
interpn, a high-performance replacement for SciPy'sinterpninterpolation function, achieving large speedups that grow even larger as the number of dimensions increases. Logan presented oninterpnat RustNYC this past February. - The
polarsdataframe library provides a faster and less memory-intensive alternative topandas, along with features like strongly typed columns, enums, lazy transformations, and streaming that make it far more convenient for working with large datasets. - Astral's beloved Python tooling options
ruff,uv, and the still-experimentaltyare all written using Rust for high performance.
In the wider ecosystem outside of Python interop:
- Rust is the only language other than C to be used in the Linux Kernel.
- A Rust rewrite of Linux's
coreutilspackage,uutilsis under development and is now running in Ubuntu. - The Zed IDE provides a smooth, high-performance alternative to VSCode and its associated clones.
- Volvo uses Rust for its Electronic Control Units (ECUs).
- Rust is now widespread at Google, Microsoft, Amazon, Meta, and many other major tech companies.
- Cloudflare uses Rust for high-throughput web infrastructure.
- Discord rewrote performance-critical services from Go to Rust. And various other Rust-based alternatives to popular software with growing market share:
-
denoreplacesnodefor JavaScript.
-
ripgrepreplacesgrep.
-
- Tauri succeeds Electron.
Despite still being quite a young language, Rust is getting considerable traction across the software industry, and in the coming years, I hope to see it become an easy choice for writing scientific Python extensions.
But I Use R Tho
Okay, fine, R-Rust interop also exists and I also hear it's pretty good. Go ask Lukas Jung about it, tell him Nico sent you.