eBPF for Continuous Profiling: Parca and Beyla
Table of contents
Actualizado: 2026-05-03
eBPF-based continuous profiling transforms how performance problems are investigated: instead of manually instrumenting each application or resorting to pprof at incident time, an agent loaded into the Linux kernel generates CPU flame graphs for all node processes constantly, with real overhead under 1%. The modern 2024 stack combines three tools with complementary roles: Parca[1] for cluster-level CPU profiling, Grafana Beyla[2] for HTTP/gRPC auto-instrumentation without code changes, and Grafana Pyroscope[3] for deep per-language profiling.
Key takeaways
- Parca deploys a DaemonSet that generates CPU profiles for all node processes via eBPF without touching applications.
- Beyla automatically instruments HTTP and gRPC requests in the kernel, generating RED metrics and OpenTelemetry traces without SDK.
- Pyroscope complements with more detailed language-level profiling for specific critical services.
- Combined overhead of the full stack is 1-2% CPU and 1-3 GB RAM per node.
- Correlation between Parca profiles and Tempo traces in Grafana lets you go from “this trace took 800 ms” to “in that interval, 60% of CPU went to GC”.
The problem continuous profiling solves
Traditional ad-hoc profiling has two structural problems. The first is that the problem is gone when you arrive: most performance regressions are intermittent or only manifest under specific load. A latency spike at 3am that does not reproduce during analysis generates unresolved tickets.
The second is that manual instrumentation has high friction cost: adding pprof to a Go service, configuring py-spy for Python, or installing JVM agents requires engineering time, change approvals, and in some cases service restarts.
eBPF solves both: the agent is always active, capturing data before, during, and after the incident, and requires no application modifications. The flame graph for the exact moment of the incident always exists, because profiling never stopped.
Parca: cluster-wide CPU profiling
Parca installs two components in Kubernetes:
- Parca Agent (DaemonSet): one process per node using eBPF to capture call stacks from all processes at a configurable frequency (default: 19 Hz).
- Parca Server: stores profiles, indexes them by service, enables real-time queries and historical comparisons.
helm install parca parca/parca --namespace monitoring
helm install parca-agent parca/parca-agent
--namespace monitoring
--set "config.relabelings[0].sourceLabels[0]=__meta_kubernetes_pod_label_app"
--set "config.relabelings[0].targetLabel=service_name"What makes Parca different from traditional profilers is stack unwinding without frame pointer. For compiled languages like Go, Rust, and C++, Parca uses DWARF debug info to reconstruct the full call stack even when the compiler has eliminated frame pointers for optimisation. For interpreted languages like Python and Node.js, Parca can capture interpreter frames.
Beyla: HTTP and gRPC auto-instrumentation
Grafana Beyla operates at a different layer from Parca. Where Parca captures CPU time, Beyla intercepts network system calls in the kernel to generate:
- RED metrics (Rate, Errors, Duration) for each HTTP/gRPC service automatically.
- OpenTelemetry traces sent directly to Grafana Tempo or any OTLP backend.
- Service graphs showing dependencies between services based on observed real traffic.
This works without modifying application code, without adding SDKs, without restarting services:
helm install beyla grafana/beyla
--set env.BEYLA_AUTO_INSTRUMENT_TARGET="*"
--set env.OTEL_EXPORTER_OTLP_ENDPOINT="http://tempo:4317"Pyroscope: deep per-language profiling
Grafana Pyroscope is the most detailed profiling layer in the stack. Where Parca uses eBPF for OS-level, Pyroscope has native per-language agents that can capture more specific information: goroutine counts and heap allocations for Go; line-level profiling and memory allocations for Python; JIT compilation overhead and GC detail for JVM.
The typical usage pattern is Parca for broad coverage and Pyroscope with native agent only for the most critical or hardest-to-debug services.
Grafana: correlation between profiles, traces, and metrics
Integration of the full stack in Grafana is where compound value emerges. With Grafana Tempo for traces, Prometheus for metrics, and Parca/Pyroscope for profiles, Grafana allows:
- Identify a slow trace in Tempo.
- Click on the time interval of that trace.
- View the Parca flame graph corresponding to that exact interval.
- Identify which specific function consumed most CPU during that trace.
This flow — from “the request took 800 ms” to “60% of that time was in this serialisation function” — is impossible without temporal correlation between profiles and traces.
Real overhead of the stack
- Parca Agent: 0.5-1% CPU per node. ~200 MB RAM per agent.
- Beyla: 0.3-0.8% CPU for medium-traffic services.
- Parca Server: typically 1-2 GB RAM + disk storage.
Total overhead of 1-2% CPU and 1-3 GB RAM per node is acceptable for most production clusters where nodes have 16-32 GB RAM.
Compatibility and requirements
Stack requirements:
- Linux kernel 5.8+ for all needed eBPF features. Kernels 5.15+ recommended.
- BTF (BPF Type Format) enabled in kernel: available by default in modern distributions.
- Kubernetes 1.24+ for the latest Parca and Beyla versions.
Conclusion
eBPF-based continuous profiling eliminates the most frustrating operational friction of performance observability: the impossibility of profiling what already happened. Parca + Beyla + Pyroscope, integrated in Grafana, provide full coverage of CPU, traces, and language-level profiles with acceptable overhead and without modifying applications. The result is that performance problems stop being forensic investigations with incomplete evidence and become analyses with temporally exact data of what occurred.