Sandboxing is the practice of running untrusted code in an isolated environment with restricted capabilities. The sandbox enforces what the code can read, write, and execute, blocking the escape vectors that would let the code harm the host system. zviz is Skelf Research's open-source sandbox for AI-generated code, written in pure Zig.

How does zviz compare to gVisor and Firecracker?

gVisor is a userspace kernel with 10-30% overhead per execution. Firecracker is a microVM with 50-100MB RAM per sandbox and ~125ms cold start. zviz is a gVisor-inspired syscall interceptor with <5% overhead and <1ms cold start, written in pure Zig. The right choice depends on the workload.

Why is sandboxing important for AI code?

Code generated by LLMs cannot be trusted. A sandbox is the standard defence: assume the code is malicious and constrain what it can do. This is essential for any LLM agent that executes its own generated code, or any system that runs untrusted scripts in production.

gVisor is a userspace kernel from Google that intercepts every syscall from the sandboxed process and re-implements it in userspace. It is used at scale by Google Cloud Run and GKE Sandbox. Mature but heavy: 10-30% overhead per execution. zviz takes the same idea and applies it to a much smaller syscall surface, with much less overhead.

Sandboxing Untrusted Code in Zig: The zviz Architecture

Q: Why is sandboxing important for AI code?

Code generated by LLMs cannot be trusted. A sandbox is the standard defence: assume the code is malicious and constrain what it can do. This is essential for any LLM agent that executes its own generated code, or any system that runs untrusted scripts in production.

Q: What is gVisor?

gVisor is a userspace kernel from Google that intercepts every syscall from the sandboxed process and re-implements it in userspace. It is used at scale by Google Cloud Run and GKE Sandbox. Mature but heavy: 10-30% overhead per execution. zviz takes the same idea and applies it to a much smaller syscall surface, with much less overhead.

AI agents are writing code and executing it. This is not a prediction — it is the present state of frontier model deployment. Coding assistants generate functions, test them, observe the output, and iterate. Research agents write data processing scripts and run them against datasets. Autonomous systems generate configuration code, deploy it, and monitor the results.

All of this code is untrusted. Not because the models are malicious, but because they are imperfect. A model that generates a shell command might accidentally rm -rf / the host filesystem. A model that writes a network client might open connections to arbitrary endpoints. A model that generates a file parser might trigger a buffer overflow that an attacker can exploit.

The standard response is containerisation. Run the untrusted code in a Docker container, a microVM, or a sandbox. The problem is overhead. Docker adds milliseconds of startup latency and significant memory overhead per container. Firecracker microVMs are lighter but still require a full kernel boot. For an AI agent that might generate and execute hundreds of small code snippets per session, these costs compound.

zviz is our research project exploring what happens when you take the sandboxing model seriously but refuse to accept the overhead. The result is a Zig-based sandbox that provides gVisor-inspired isolation with single-digit microsecond startup and near-zero steady-state overhead.

Why gVisor’s Model, Not Docker’s

To understand zviz’s architecture, it helps to understand how existing sandboxing technologies work at a fundamental level.

Docker uses Linux namespaces and cgroups to create isolated environments. The untrusted code runs directly on the host kernel, but its view of the system is restricted. This is fast — there is no virtualisation overhead — but the attack surface is the entire Linux system call interface. A kernel vulnerability in any of the hundreds of syscalls available inside a container can be exploited to escape the sandbox.

Firecracker and similar microVMs run a full guest kernel inside a lightweight virtual machine. The attack surface is much smaller — just the VMM’s emulated device interface — but the cost is a full kernel boot on every start.

gVisor takes a middle path. It implements a user-space kernel that intercepts system calls from the sandboxed application and re-implements them in a controlled environment. The untrusted code never talks to the real kernel directly. This dramatically reduces the attack surface (the untrusted code can only do what the user-space kernel allows) while avoiding the boot cost of a full VM.

zviz follows gVisor’s model but makes a different set of engineering trade-offs. Where gVisor is a general-purpose sandbox written in Go (with the GC overhead that implies), zviz is a specialised sandbox for short-lived code execution, written in Zig for zero-overhead abstraction.

Why Zig

We chose Zig for zviz for three specific technical reasons, not out of language preference.

Comptime for Zero-Overhead Specialisation

zviz needs to intercept system calls and decide, for each call, whether to permit it, deny it, or emulate it. The set of permitted calls varies by sandbox profile. A profile for “pure computation” might allow only read, write, mmap, and exit_group. A profile for “network client” might additionally allow socket, connect, sendto, and recvfrom.

In a conventional implementation, you would store the profile in a data structure and look up each syscall at runtime. This works, but it adds a branch and a table lookup to every single system call, which matters when the code inside the sandbox makes millions of syscalls.

Zig’s comptime lets us do something better. The sandbox profile is a comptime parameter:

fn createSandbox(comptime profile: SandboxProfile) Sandbox {
    // At compile time, generate a specialised syscall handler
    // that directly permits/denies/emulates each syscall
    // with no runtime table lookup
    return .{
        .handler = comptime generateHandler(profile),
    };
}

The compiler evaluates generateHandler at compile time and produces a specialised function that handles each syscall with a direct branch — no table, no indirection, no dynamic dispatch. The result is that syscall interception in zviz adds roughly 4-8 nanoseconds per call, depending on the syscall and the profile.

In Rust, achieving the same effect would require proc macros, which operate on token streams and are significantly harder to write and debug. In C, you would use preprocessor macros or code generation, both of which are fragile and difficult to maintain.

Explicit Allocator Control

In a sandbox, you need to control memory allocation precisely. The sandbox’s own memory must be isolated from the sandboxed code’s memory. Memory limits must be enforced. Allocation failures must be handled gracefully, not with an OOM kill.

Zig’s allocator interface makes this natural. Every function that allocates takes an Allocator parameter. There is no hidden global allocator. zviz uses this to provide the sandboxed code with a capped arena allocator that enforces memory limits and is fully deallocated when the sandbox exits, while the sandbox runtime itself uses a separate allocator that is never exposed to the sandboxed code.

No Hidden Control Flow

Zig has no hidden function calls. There are no constructors, destructors, operator overloads, or exception handlers running behind your back. When you read a line of Zig code, what you see is what executes. In a security-critical component like a sandbox, this property is valuable. It means that a code review of the syscall interception path can verify, line by line, that every branch is accounted for. There is no hidden cleanup code that might inadvertently leak a file descriptor or leave a resource in an inconsistent state.

Architecture

zviz consists of four layers:

1. The Syscall Interceptor

This is the innermost layer. It uses Linux’s seccomp-bpf to trap system calls from the sandboxed process and redirect them to zviz’s handler. The handler is generated at compile time from the sandbox profile, as described above.

For permitted syscalls, the handler simply allows them to pass through to the real kernel. For denied syscalls, it returns EPERM. For emulated syscalls (like filesystem access, which may need to be redirected to a virtual filesystem), the handler implements the syscall semantics itself.

2. The Virtual Filesystem

Sandboxed code often needs to read files — configuration, input data, model weights. But it should not have access to the host filesystem. zviz provides a virtual filesystem that maps a controlled set of paths into the sandbox. The mapping is defined at sandbox creation time and is immutable thereafter.

The virtual filesystem supports read-only mounts (for input data), read-write mounts to tmpfs-backed directories (for scratch space), and overlay mounts that present a copy-on-write view of a host directory. All filesystem syscalls (open, read, write, stat, lseek, etc.) are intercepted and handled through the virtual filesystem layer.

3. The Resource Controller

This layer enforces CPU time limits, memory limits, and I/O bandwidth limits on the sandboxed process. It uses Linux cgroups v2 for enforcement, but wraps them in an API that is configured at compile time from the sandbox profile.

The resource controller also handles graceful termination. If the sandboxed code exceeds its CPU time limit, the controller sends SIGKILL and reports the violation to the caller. If it exceeds its memory limit, the OOM killer within the cgroup terminates the process, and the controller reports it.

4. The Lifecycle Manager

This outermost layer manages sandbox creation, execution, and teardown. It is responsible for:

Forking and configuring the sandboxed process
Setting up seccomp filters, cgroups, and the virtual filesystem
Collecting the sandboxed process’s exit status and any output
Cleaning up all resources after the sandbox exits

The lifecycle manager is designed for rapid reuse. After a sandbox exits, its resources are recycled into a pool for the next invocation. This amortises the cost of cgroup creation and seccomp filter installation across multiple sandbox invocations.

Performance Characteristics

We measure zviz’s performance along three axes:

Startup latency: The time from requesting a sandbox to having the sandboxed code executing. With a warm pool, this is approximately 12 microseconds. Cold start (no pool) is approximately 180 microseconds. For comparison, Docker cold start is typically 200-500 milliseconds, and Firecracker is approximately 125 milliseconds.

Steady-state overhead: The additional cost per system call due to interception. For permitted syscalls (pass-through), this is approximately 4 nanoseconds. For emulated syscalls (virtual filesystem), this varies from 200 nanoseconds to 3 microseconds depending on the operation. For comparison, gVisor’s overhead is typically 1-5 microseconds per syscall.

Memory overhead per sandbox: zviz’s runtime uses approximately 64 KB of memory per active sandbox, excluding the memory allocated to the sandboxed process itself. Docker containers typically use 10-50 MB of overhead.

These numbers make zviz practical for the use case we care about: AI agents executing hundreds or thousands of small code snippets per session, where each snippet runs for milliseconds to seconds.

Security Model and Limitations

zviz’s security model is defence in depth. The sandboxed process is isolated by:

seccomp-bpf filters that restrict available syscalls
Linux namespaces that provide filesystem, network, and PID isolation
The virtual filesystem that controls file access
cgroups that limit resource consumption

This is not a theoretical security boundary — it relies on the correctness of the Linux kernel’s seccomp, namespace, and cgroup implementations. A kernel vulnerability could allow sandbox escape. This is the same trust boundary as Docker and gVisor.

zviz does not currently support sandboxing on non-Linux systems. The architecture is Linux-specific by design, because the sandboxing primitives (seccomp, namespaces, cgroups) are Linux-specific.

zviz also does not currently support sandboxing GPU workloads. Passing a GPU device into a sandbox while maintaining isolation is an open problem that we are actively investigating.

What We Have Learned

Building zviz has reinforced several beliefs and challenged others:

Zig’s comptime is genuinely transformative for systems programming. The ability to write ordinary code that executes at compile time, without the cognitive overhead of a macro sublanguage, made zviz’s specialised syscall handlers practical. We would not have attempted this architecture in Rust or C.

The gVisor model is underexplored for AI use cases. Most sandboxing discussion in the AI community centres on Docker containers. gVisor-style syscall interception offers a fundamentally better trade-off for the short-lived, high-frequency execution pattern that AI agents require.

Zig’s ecosystem immaturity is a real cost. We spent non-trivial time working around missing standard library features, incomplete documentation, and tooling gaps. For a less systems-focused project, this cost would likely be prohibitive.

zviz is open source, and we welcome contributions — particularly from those with experience in kernel security, Zig, or AI agent infrastructure.