gVisor: sandboxing for multi-tenant containers

Logotipo oficial del proyecto gVisor alojado en el repositorio público de Google en GitHub, runtime de contenedores que implementa un kernel en espacio de usuario llamado Sentry para interceptar las llamadas al sistema del proceso invitado y reducir la superficie de ataque hacia el kernel del anfitrión, pieza central en despliegues multi-tenant donde el aislamiento clásico de contenedores no alcanza el nivel de confianza exigido por funciones serverless o por clientes compartiendo infraestructura

Containers share the host kernel, and that property, which explains much of their lightness, also sets their ceiling in multi-tenant environments where trust between workloads is low. gVisor was born inside Google precisely to raise that ceiling: insert a kernel written in Go between the container process and the real host kernel, reduce the exposed system call surface, and offer an intermediate form of isolation between the traditional container and the lightweight virtual machine. In late 2025, with years of use in App Engine and Cloud Run, adoption by third-party serverless platforms, and a large enough technical user base to have well-formed opinions, it’s time to review what it does well, what it does badly, and when it pays off.

What gVisor is and why it was built

gVisor implements an OCI-compatible runtime called runsc that replaces runc. The decisive difference is that runsc doesn’t let the container process talk to the host kernel through the system call table. Instead, a component called Sentry, a Linux kernel reimplemented in Go running in user space, intercepts those calls and responds to most of them itself, only talking to the host when there’s no other option and always through a narrow perimeter. The result is that a kernel exploit that would normally escalate from container to host has to traverse Sentry first, which is much smaller and written in a memory-safe language.

Google open-sourced the code in 2018 and has been using it in production to run arbitrary customer code on platforms where shared isolation isn’t enough. App Engine, Cloud Run and Cloud Functions lean part of their trust model on gVisor, and several serverless and edge providers have integrated runsc as alternative or complement to Firecracker. The operational motivation is clear: a sandbox with startup times close to a classic container but with a narrower threat model than plain kernel sharing.

Architecture: Sentry, Gofer and platform mode

When a container starts with runsc, the runtime creates two main host processes. Sentry is the user-space kernel and runs the container code; Gofer is a separate process that mediates filesystem access, talking to the host on behalf of the container when it needs to read or write. The separation is deliberate: even if an attacker compromises Sentry, they still have to cross Gofer to touch disk, and neither Sentry nor Gofer has privileged capabilities beyond what’s strictly needed.

System call interception uses two platform modes. The ptrace mode is portable but slow, and rarely used in production. The KVM mode leverages hardware virtualization extensions to trap system calls without the ptrace overhead, with notably better performance but requiring compatible hardware and permission to use /dev/kvm. In 2024 the systrap mode arrived, using seccomp with notification filters to intercept calls without depending on KVM or ptrace, and it’s the default developers recommend in 2025: good performance, portable, no special hardware requirements.

An important detail is that Sentry doesn’t implement every Linux system call. It covers most of what a typical program needs, but there are obscure or very specific calls that aren’t implemented and will fail with errors if the container tries them. This is deliberate: each implemented call is potential attack surface, and the project prefers useful coverage to total coverage.

Performance: where it wins and where it loses

The required question with any sandbox is how much it costs. For CPU-heavy workloads with little kernel contact, gVisor is very close to a native container, because execution happens directly on the processor and Sentry only enters on the edges. Benchmarks published by Google and reproduced by third parties show differences on the order of 5 to 10 percent for pure compute loads on systrap, and even smaller on KVM.

The story changes with I/O-heavy loads. Any operation that crosses into Sentry pays the interception cost, and if it also touches disk it goes through Gofer too. A traditional database under gVisor performs noticeably worse than under runc: benchmarks I’ve seen show 20 to 50 percent penalties in transactions per second depending on access pattern. Redis suffers less because its operations barely touch the filesystem; Postgres or MySQL with constant writes suffer much more. The network gets similar treatment: gVisor ships its own TCP stack written in Go, which works fine for moderate traffic but doesn’t match the Linux kernel in raw throughput or coverage of advanced features.

The operational lesson is that gVisor is a good choice when isolation matters more than the last step of performance: HTTP APIs, short-running serverless functions, batch jobs, execution of untrusted user code. It’s a bad choice for heavy databases, distributed filesystems, or any workload whose main metric is IOPS.

Comparison with Kata Containers and microVMs

The obvious comparison is with Kata Containers, which also seeks reinforced container isolation but with a different approach: it starts a small virtual machine using Firecracker or QEMU and runs the container inside. The threat model is different. Kata bets on the hardware barrier of the hypervisor; gVisor bets on surface reduction in user space. Both are legitimate and are chosen based on what you want to defend against.

Kata tends to have better compatibility with I/O and network-heavy workloads because inside the VM runs a complete Linux kernel with years of optimization. gVisor tends to start faster and consume less fixed memory per container, because there’s no hypervisor or full kernel to load. On Cloud Run, where cold start of each function matters, choosing gVisor makes sense; on workloads where a sandbox stays alive for hours, Kata can compensate via better sustained performance.

Firecracker alone, without the Kata wrapper, is a different building block. Providers like Fly.io and AWS Lambda use it directly. The threat model is strong thanks to hypervisor separation, but operating pure Firecracker implies building much more orchestrator integration than runsc needs, which plugs into containerd with a handful of config lines.

Operation and deployment

Integrating gVisor into an existing cluster is relatively easy. Install runsc, configure containerd to recognize it as an alternative runtime, and use a Kubernetes RuntimeClass to mark which pods should start under it. Marked pods run with Sentry and Gofer; the rest keep using runc. This lets you apply gVisor only to workloads that need it, without imposing its I/O cost on the whole cluster.

Observability works reasonably well. runsc exports Prometheus-format metrics with CPU and memory usage information, and logs integrate with the usual logging stack. Diagnosis when something fails is trickier than with runc, because messages can come from Sentry rather than the real kernel, but project documentation has improved a lot and there’s an active community answering questions.

When it pays off

My reading after following the project is that gVisor has found a clear spot: multi-tenant workloads where isolation matters and the usage pattern is CPU-heavy and I/O-light. Platforms that run third-party code, serverless functions, test environments where different users share nodes, educational clusters, malware analysis labs. In all of those, the attack-surface reduction easily offsets the 5 to 10 percent performance loss.

Where it doesn’t pay off is in first-party workloads from an organization that trusts its own code. If all pods in a cluster come from the same team through the same pipeline, adding an extra sandbox rarely justifies the operational cost. And in workloads with heavy disk or network I/O, the penalty is large enough that Kata or even a full VM are better options.

The choice is never binary: many operators use gVisor for part of the cluster, Kata for another, and runc for the rest, picking the barrier that best matches each workload’s trust level and performance profile. That heterogeneity is today’s mature answer to how to isolate containers in many-actor environments, and gVisor occupies a legitimate place within it.

Entradas relacionadas