Kubernetes 1.34: a summary for teams with little time

Diagrama conceptual de contenerizacion mostrando la separacion entre aplicaciones, tiempo de ejecucion y nucleo del sistema, representativo del entorno que orquesta Kubernetes y sus progresos en la version 1.34

Kubernetes 1.34 was released in mid-July, following the project’s usual quarterly cadence. Like recent versions, it brings nothing revolutionary but a collection of incremental improvements that require careful note-reading to separate what affects your cluster from what does not. This post gathers what I found relevant from the seat of someone running small Kubernetes and Swarm clusters, thinking mostly of teams who cannot spend a full week on the changelog.

Changes you can upgrade without much thought

Most changes in 1.34 are internal improvements that break nothing and enable immediate benefits. The scheduler gets improvements in node ranking when complex affinities are involved, which reduces decision latency in large clusters. kube-proxy gains better iptables rule handling when services have many external IPs, a scenario that previously degraded performance visibly.

There are also network stack improvements. Native IPv6 dual-stack support moved to GA, meaning it is no longer an experiment and can be treated as stable configuration. For those already using it in beta, not much changes, but for those who avoided it because of status, green light is now given. CNI integration keeps full compatibility; Calico, Cilium, and Flannel plugins need no update for this version.

kubelet has improved its cgroups v2 handling, which is already default on Debian 13 and Ubuntu 24.04. If you come from an earlier version on cgroups v1, migration remains transparent because kubelet detects both. Legacy cgroups v1 cleanup is announced for 1.36, three versions ahead.

Dynamic resource allocation: progress, not closure

The most visible novelty for anyone working with AI workloads is progress on Dynamic Resource Allocation, DRA. In 1.34 several missing features move to beta: sharing devices between pods, claims with advanced selectors, and accounting by device attributes. This matters a lot for GPU and specialized hardware like NPU or FPGA, where the traditional device plugin abstraction fell short.

DRA was proposed in 2023 and has gone through several alpha and beta iterations. The idea is to let a workload say I need a resource meeting these criteria rather than I need two GPUs, and let the scheduler match criteria with available devices. This is critical on mixed fleets where an H100 and an L40 are not equivalent even though both are NVIDIA GPUs. In 1.34 the API is more stable but not yet GA, and my production advice is to stick with the classic device plugin if it works. For new deployments on complex AI workloads, DRA is worth the risk.

Declarative validation: the less visible, more useful change

The piece I liked most in 1.34 is declarative validation for custom resources. Until now, validating a CustomResource with complex rules required an admission webhook, with added latency and an external dependency that could fail. With declarative validation, rules are expressed in CEL inside the CRD itself, and the apiserver evaluates them directly without network calls.

The benefit is twofold. First, a failure component is eliminated: if the webhook was down, creates failed. Second, latency of custom resource creation is reduced, which in clusters with many operators can be visible. In a cluster I run with five operators validating CRDs via webhook, kubectl apply latency dropped between 150 and 400 ms per resource after migrating to CEL validation. For deployment scripts applying dozens of resources, the difference is noticeable.

Migration requires rewriting validation logic in CEL, which has a learning curve but is not hard. Kubernetes documentation includes practical examples and most common cases solve with one-line expressions. Where webhook still wins is validation that requires consulting other resources or external services, because CEL is deliberately restricted.

Deprecations worth marking in red

The 1.34 deprecation list is not long but is important. The PodSecurityPolicy v1 API, deprecated since 1.21 and removed in 1.25, appears now in migration docs with a firm reminder for those still carrying references in code even if not actively used. If your Helm repo or charts still mention PodSecurityPolicy, it is a good time to clean up.

More relevant is the deprecation of the –feature-gates flag in several subcomponents that previously accepted it for convenience. In 1.34 a warning appears in logs; in 1.35 the flag will be ignored; in 1.36 it will prevent the process from starting. Most uses are in startup scripts that went from alpha to beta to GA without cleaning the flag. It is trivial to fix but only if someone finds it before it breaks a boot.

The batch/v1beta1 CronJob API, deprecated several versions ago, is removed definitively in 1.34. Any manifest using that version simply will not apply. If your GitOps has old CronJobs, migrate to batch/v1 before upgrading.

The change that affects operators

For teams writing or maintaining operators, 1.34 brings a notable improvement in the leader election lifecycle. The leader election API received an improvement in expiration handling that reduces double-leader windows, a classic problem when a controller briefly loses apiserver connectivity. The API is backward compatible, but recompiling the operator against client-go 1.34 is recommended to get the improvement.

Another interesting change for operators is the enhanced apiserver health check endpoint, now exposing metrics on declarative validation latency and on scheduler ranking. This helps diagnose when a cluster starts slowing down: now there is direct visibility into each phase instead of inferring from logs.

What I would do this week

My checklist for teams with small or medium production clusters. First, verify no manifests use APIs removed in 1.34 before upgrading: batch/v1beta1 and some remaining extensions/v1beta1. A simple search in the GitOps repo catches 90% of cases. Second, if you use validation webhooks for your own CRDs, plan migration to CEL validation for the next iteration, not urgent but clearly better. Third, if you run GPU workloads in production, read DRA docs but do not migrate yet if the current system works. Fourth, upgrade in pre-production, watch for a week, and then production.

Time between a release and availability on managed clusters like GKE, EKS, AKS is usually six to twelve weeks. If you use those services, the prep window is clear. If you run self-managed clusters, timing depends on your cycle, but my practical rule is to wait at least until the .2 patch before production. The first two patches close 70% of minor bugs discovered in early weeks.

My take

Kubernetes is in a calm maturation phase. Releases no longer bring dazzling novelties, and that is healthy. The project is ten years old and most critical functionality exists since 1.20 or earlier. What comes now is polish: better scheduling, better security, better telemetry, fewer accessory components. Declarative validation is a good example: it removes an external system without removing capability.

What slightly worries me is the growing number of feature gates in alpha, beta, and GA states coexisting in each release. For a team following Kubernetes closely this is manageable, but for teams that touch the cluster twice a year the cognitive load grows. I think in 2026 the project will have to clean up and question whether all active flags still make sense.

Overall, 1.34 is a release to upgrade calmly, verify nothing is broken in deprecations, and relax. There is nothing forcing a rush. The only exception is teams with AI workloads using device plugin and frustrated by the old model’s rigidity: for them, DRA in beta is enough reason to prioritize the upgrade. Everyone else can wait until the next cycle without missing anything.

Entradas relacionadas