Jacar mascot — reading along A laptop whose eyes follow your cursor while you read.
Metodologías Tecnología

Continuous profiling with eBPF in production

Continuous profiling with eBPF in production

Actualizado: 2026-05-03

Continuous profiling stopped being an experiment two years ago. When I prepare a performance review on a real-traffic service, the first place I look is not the aggregated Prometheus metric but the CPU profile of the last fifteen minutes in Parca or Pyroscope. The leap isn’t rhetorical: knowing that 12 % of CPU time is going to one specific JSON serialisation function is a different answer than knowing that p99 has risen.

Key takeaways

  • Continuous profiling changes the diagnosis model: you no longer have to reproduce the problem because it’s recorded in history.
  • Typical overhead of an eBPF agent in production is 1–3 % of CPU, depending on sampling frequency.
  • Three mature options: Parca (kernel-native infrastructure oriented), Pyroscope integrated in Grafana, and Grafana Beyla (combined autoinstrumentation).
  • Pays off in services with direct impact on cost or user experience: databases, API gateways, high-concurrency services.
  • Doesn’t pay off the same way in low-load services, short-run batch, or environments with strict data regulation.

What continuous profiling solves

Classic profiling works in two modes. Either you instrument code with timers and counters, which is expensive in lines of code and requires anticipating where you’ll measure, or you fire perf or equivalents on demand once you already have an incident. Both paths fail in the intermediate case: when performance has drifted over the past week and no one knows when it started.

Continuous profiling changes the model. The system samples each process’s execution stack every few milliseconds, uninterruptedly, and stores profiles tagged by service, time and version. When performance degrades, you compare last week’s profile to today’s and see exactly which function has grown in CPU share.

The practical difference is that you no longer have to reproduce a problem to diagnose it. The problem is recorded in history, and your job is interpreting flame graphs with the previous version side by side. This retrospective diagnostic capability is also what makes continuous RAG evaluation valuable: detecting when degradation started before it becomes visible.

The role of eBPF

The obvious question is why now, if perf has been in Linux since 2009. The answer is eBPF. Without eBPF, collecting continuous profiles meant instrumenting each binary in a specific language, dealing with debug symbols deployed in production, and paying an unacceptable CPU cost. With eBPF, a kernel probe samples for you across any process, without changing the binary and at a measurable, bounded cost.

The concrete technical piece that has matured is the profile attach. The kernel exposes a probe that fires on each context switch and captures the active process’s stack. The user-space tool — Parca, Pyroscope or a Grafana Beyla integration — aggregates those samples and exports them to a backend with its own query language.

The second piece that has changed is symbol resolution. Current tools download debug symbols from original binaries, cache them, and apply reasonable fallbacks when not found. For natively compiled languages support is already very good. For managed languages like Java or Python there’s an extra step to map JIT addresses, which isn’t always perfect.

Real cost in production

The question the platform team always asks before approving deployment is how much it costs. The answer I’ve measured in three different environments: typical overhead is between 1 % and 3 % of CPU. That covers the eBPF agent sampling, local aggregation, and shipping to the central aggregation layer.

This number has nuances:

  • Overhead is proportional to sampling frequency. By default many tools sample at 19 or 99 Hz (frequencies that don’t fall on kernel clock harmonics). Dropping to 19 Hz halves the cost.
  • The other cost, less obvious but real, is storage. A per-node agent shipping profiles tagged by service and version can generate several hundred megabytes per day compressed. Multiplied by the typical two to four-week retention, it’s a budget to plan for.

Which tool to pick

The ecosystem has consolidated on three serious options:

  • Parca, by Polar Signals: the most kernel-native infrastructure-oriented, with a data model designed for continuous profiles and its own backend.
  • Pyroscope, acquired by Grafana in 2023 and integrated into the Grafana stack: the one that fits best if you already have Grafana and Loki.
  • Grafana Beyla: added combined autoinstrumentation and profiling in 2024, though with fewer tuning options.

In practice I’ve ended up using Pyroscope in environments with a Grafana stack already in place and Parca when the team wants to separate profiling telemetry from the rest of observability. The decision isn’t technical, it’s organisational.

For Kubernetes there’s an official Pyroscope agent deployed as DaemonSet that captures profiles from all pods on the node automatically. For Docker Compose or Swarm a per-host agent watching processes by cgroup is enough.

When it pays off and when it doesn’t

Continuous profiling pays off in services where performance has direct impact on cost or user experience:

  • Databases.
  • API gateways.
  • High-concurrency services.

In these cases a 10 % CPU optimisation can save several thousand euros per year in infrastructure. This cost-benefit analysis fits the FinOps for AI approach: instrument first to identify hotspots before optimising.

It doesn’t pay off the same way in:

  • Low-load services, back-office applications, or batch pipelines with short runs.
  • Environments with very strict data regulation: a continuous profile captures function names that may leak information about which operations run at what frequency.

My read

My sense after more than a year integrating continuous profiling into four different environments is that it’s the most useful novelty to reach observability since OpenTelemetry stabilised. Not because it provides data that couldn’t be obtained before, but because it provides it passively, constantly, and tagged by service and version. Differential diagnosis becomes possible.

The only reasonable fear remains the CPU cost. The real number in our systems has been lower than we feared, but I recommend running a week-long test on a production node before approving a general rollout.

To teams just starting I suggest installing one of these systems even if they don’t have a specific performance problem right now. The real value appears when they do, and the learning curve of interpreting flame graphs is much easier with your own historical data than with other people’s demos.

Was this useful?
[Total: 11 · Average: 4.5]

Written by

CEO - Jacar Systems

Passionate about technology, cloud infrastructure and artificial intelligence. Writes about DevOps, AI, platforms and software from Madrid.