Why We Write AI Infrastructure in Rust (and Zig, and Go)
Language choice as research methodology — how memory-safe, deterministic-performance languages produce falsifiable systems claims.
There is no shortage of “Why we rewrote X in Rust” posts on the internet. This is not one of those. We are not here to tell you that Rust is fast, or that its type system is elegant, or that the community is friendly. All of those things may be true, but none of them are why we chose Rust — or Zig, or Go — for the systems that underpin our AI research.
This is a post about methodology. Specifically, it is about why language choice is a methodological decision when you are building infrastructure whose behaviour you intend to make falsifiable claims about.
The Problem with Benchmarking on Quicksand
Consider a common scenario in systems research. You build a prototype of a new scheduling algorithm, a caching layer, or a storage engine. You benchmark it. You report that your system achieves X throughput at Y latency under Z workload. You submit the paper.
But buried in the results are confounding variables that have nothing to do with your contribution. A garbage collector pause added 14 milliseconds to your p99 tail latency. A buffer overflow in your comparison baseline inflated its error rate. A data race in your concurrent hash map produced silently wrong results that happened to look like correct results during the evaluation window.
These are not hypothetical problems. They are the daily reality of systems research conducted in languages that leave memory management, concurrency control, and performance determinism as exercises for the programmer.
When we set out to build the infrastructure behind projects like numaperf, embedcache, and memista, we made a deliberate decision: the language itself should eliminate as many confounding variables as possible. Not because we are language partisans, but because we want our experimental claims to be about the thing we are studying — topology-aware scheduling, embedding cache eviction, vector similarity search — rather than about whether we remembered to free a buffer.
Rust: The Ownership Model as Experimental Control
Rust is our default choice for projects where we need to make precise claims about memory behaviour and performance characteristics. numaperf studies NUMA-aware scheduling for latency-critical AI workloads. embedcache investigates caching strategies for high-dimensional vector computations. memista explores SQLite-backed approximate nearest-neighbour search. All three are written in Rust.
The ownership model is the key. When we write a NUMA-aware allocator in Rust, we know at compile time that every allocation has exactly one owner, that borrows are checked statically, and that there are no use-after-free bugs lurking in our experimental harness. This is not a convenience — it is an experimental control. It means that when we measure the performance difference between NUMA-local and NUMA-remote allocations in numaperf, we can be confident that the difference reflects topology effects, not memory corruption artefacts.
The borrow checker also eliminates data races in concurrent code. When embedcache processes concurrent cache lookups and evictions, Rust’s type system guarantees that shared mutable state is accessed through proper synchronisation primitives. We do not need to wonder whether a race condition is inflating our hit-rate numbers.
There are real costs. Rust’s compile times are slow — significantly slower than C, Go, or Zig. The learning curve is steep, particularly around lifetime annotations and trait bounds. Some patterns that are trivial in other languages (self-referential structs, certain graph structures) require unsafe code or architectural workarounds. And the borrow checker occasionally rejects programs that are, in fact, correct. These are genuine productivity taxes.
We pay them because the alternative — spending weeks hunting a subtle memory bug that turns out to invalidate an entire evaluation — is worse.
Zig: Comptime as Zero-Cost Abstraction
zviz is our sandbox for executing untrusted AI-generated code. It implements gVisor-inspired isolation with near-zero runtime overhead. It is written in Zig, not Rust.
The reason is comptime. Zig’s compile-time execution model lets us generate specialised sandbox configurations at compile time, producing code that is as fast as hand-written assembly while remaining maintainable and safe. In Rust, achieving the same effect requires proc macros — a separate, complex sublanguage that operates on token streams. In Zig, comptime is just Zig. You write ordinary functions, and the compiler evaluates them at compile time when the inputs are known.
For zviz, this matters concretely. The sandbox needs to intercept system calls, validate their arguments, and either permit or deny them. The set of permitted syscalls varies by sandbox profile. In Zig, we define the profile as a comptime parameter, and the compiler generates a specialised syscall filter with no runtime branching over the profile. The result is a syscall interception layer that adds single-digit nanoseconds of overhead.
Zig also gives us explicit control over allocators. In a sandbox, you want to know exactly where every byte of memory comes from and goes to. Zig’s allocator interface makes this the default, not an afterthought. Every function that allocates takes an allocator parameter. There is no hidden global allocator.
The downsides are real too. Zig’s ecosystem is young. The standard library is still stabilising. IDE support is improving but lags behind Rust. Documentation is sparse. We accept these costs for zviz because comptime is not a nice-to-have for the project — it is architecturally central.
Go: Concurrency Without Ceremony
route-switch, our LLM routing and prompt-tuning system, is written in Go. This might seem contradictory for a lab that just spent several paragraphs arguing for deterministic performance. route-switch routes queries to different LLM endpoints based on cost-quality optimisation. The latency-critical path is the network call to the LLM provider, which takes tens to hundreds of milliseconds. A garbage collector pause of a few hundred microseconds is irrelevant at this timescale.
What matters for route-switch is the ability to manage many concurrent connections to different LLM providers, handle graceful failover, and implement MIPROv2-based prompt tuning with minimal boilerplate. Go’s goroutines and channels make concurrent network programming straightforward. The standard library’s HTTP server and client are production-grade. The build toolchain produces statically linked binaries with no fuss.
Go’s GC is a confounding variable we can tolerate in route-switch because the system’s performance claims are about routing decisions (which model to call, with what prompt), not about sub-millisecond latency. If we were building route-switch to study microsecond-level scheduling, we would use Rust.
This is the core of our argument: language choice should follow from what you are trying to measure. A GC is a confounding variable when you study tail latency. It is irrelevant when you study routing policy.
The Meta-Principle: Languages as Methodology
We do not have a “one language to rule them all” policy. We have a methodological principle: the language should eliminate confounding variables relevant to the claims you intend to make.
This means:
- When measuring memory access patterns (numaperf): Use a language with no GC and explicit memory control. Rust.
- When building zero-overhead compile-time abstractions (zviz): Use a language where compile-time execution is a first-class feature. Zig.
- When managing concurrent I/O at millisecond timescales (route-switch): Use a language with lightweight concurrency and a mature networking stack. Go.
- When studying cache behaviour (embedcache) or storage engine performance (memista): Use a language where you control allocation and deallocation. Rust.
This is not about which language is “best.” It is about which language produces the most trustworthy experimental results for a given research question.
What This Means for Reproducibility
There is a secondary benefit to this approach that we did not initially anticipate: reproducibility. When a system is written in Rust, another researcher can build it on a different machine and get the same performance characteristics (modulo hardware differences). There is no “well, it depends on your GC tuning” or “you need to set the JVM heap to exactly 8 GB.” The performance is determined by the code, the compiler flags, and the hardware. Full stop.
This is not a universal truth — Rust programs can still exhibit non-deterministic behaviour through thread scheduling, I/O ordering, or LLVM optimisation decisions. But the surface area for non-determinism is dramatically smaller than in garbage-collected or manually-managed languages.
Conclusion
We write AI infrastructure in Rust, Zig, and Go not because we think these languages are the future of all programming, but because we think language choice is part of the scientific method. When you make a claim about system performance, the tools you used to build and measure that system are part of the claim. Choosing tools that minimise confounding variables is not engineering preference — it is experimental hygiene.
If you are building AI infrastructure and intend to make empirical claims about its behaviour, consider what your language is and is not controlling for. The answer might surprise you.