Releases

Kubernetes 1.35: What Ships December 17th That Actually Matters

The Kubernetes 1.35 release preview lands in two days, December 17, 2025, and it’s packed with changes that will fundamentally alter how you run stateful workloads, schedule AI training jobs, and secure multi-tenant clusters. After months of beta testing and 59 tracked enhancements, this release graduates in-place Pod resizing to stable, introduces gang scheduling for […]

Jack Pauley December 15, 2025 6 min read
kubernetes 1.35 release preview

The Kubernetes 1.35 release preview lands in two days, December 17, 2025, and it’s packed with changes that will fundamentally alter how you run stateful workloads, schedule AI training jobs, and secure multi-tenant clusters. After months of beta testing and 59 tracked enhancements, this release graduates in-place Pod resizing to stable, introduces gang scheduling for distributed workloads, and forcibly removes cgroup v1 support that will break unprepared nodes. If you’re running production Kubernetes, this isn’t just another quarterly bump—it’s a hard deadline for infrastructure decisions you’ve been deferring.

Unlike typical releases that graduations and deprecations, 1.35 ships with 14 net-new alpha features targeting the AI/ML workload explosion, including numeric taint tolerations for SLA-based scheduling and device binding conditions that prevent GPUs from being allocated before they’re physically ready. The security model gets tighter with constrained impersonation for delegation and pod certificates reaching beta for native mTLS. But the headline everyone’s watching: cgroup v1 nodes will fail to start unless you explicitly opt out. This is the canary in the coal mine for next year’s complete removal.

The Features That Actually Change Your Day

In-Place Pod Resize Goes GA: The Wait Is Over

After living in beta since 1.27, in-place Pod resource updates finally graduate to stable. You can now adjust CPU and memory requests/limits on running Pods without the terminate-recreate dance that’s plagued stateful applications for years. The kubelet tracks which images were pulled with credentials and forces re-authentication even for local images—closing a multi-tenant security hole that’s existed since the beginning.

This matters because vertical scaling no longer means downtime for StatefulSets. A Postgres pod that needs more memory during end-of-month processing? Patch it live. A Redis cache that’s over-provisioned during off-peak? Shrink it without dropping connections. The Container Runtime Interface now reports real-time resource configurations, so your monitoring stack sees the truth, not stale spec values.

Gang Scheduling: AI/ML Workloads Get First-Class Treatment

The new alpha Workload API introduces native gang scheduling—”all or nothing” Pod placement that prevents the deadlock scenario where 5 of 10 training workers start but block the cluster waiting for resources that never arrive. The scheduler now understands Pod groups with a minCount parameter, holding back the entire batch until capacity exists for the quorum.

This is purpose-built for distributed training jobs, MPI workloads, and StatefulSets where partial starts waste expensive GPU time. Instead of external schedulers like Volcano or Kueue, it’s baked into kube-scheduler with a policy-driven Workload object. The design explicitly avoids taking over Pod creation from existing controllers—Jobs, StatefulSets, and custom operators remain in control while the scheduler makes smarter placement decisions.

Pod Certificates Reach Beta: mTLS Without the Plumbing

The pod certificate feature graduates to beta with a critical addition: native support for workload identity through X.509 certificates issued directly by the cluster. The kubelet generates a key pair, submits a PodCertificateRequest to the control plane, and mounts the signed certificate chain as a projected volume—no SPIFFE/SPIRE infrastructure required.

This unlocks service mesh deployments without external certificate managers. A pod can prove its identity to other services using short-lived, automatically rotated certificates. The private key never leaves the node, API validation happens at the server level, and the entire flow integrates with Node Restriction for security. The new spec.userConfig field in beta allows custom certificate attributes without restarting the API server.

cgroup v1 Removal: The Breaking Change You Can’t Ignore

Kubernetes 1.35 moves cgroup v1 deprecation to beta, which means kubelet fails to start by default on nodes running cgroup v1. This isn’t a warning anymore—it’s a hard stop. The Linux ecosystem has been migrating away from v1 for years (systemd drops support in upcoming OS releases), and major cloud providers standardized on v2.

If you’re running older distributions or custom Linux builds, your nodes will refuse to join the cluster unless you explicitly set NodeRemoveCgroupV1Support=false. The complete code removal is scheduled for 1.38, giving teams roughly nine months to migrate. The unified cgroup v2 hierarchy provides better resource isolation and serves as the foundation for modern features like memory QoS and swap accounting.

Security Tightens: Constrained Impersonation and Certificate Validation

Two new alpha features address long-standing security gaps. Constrained impersonation adds granular RBAC verbs like impersonate:user-info and impersonate-on:user-info:list, requiring explicit permission for both who you impersonate and what actions you perform while impersonating. This prevents the “all-or-nothing” privilege escalation that made operators nervous about delegation.

The second addition: kubelet serving certificate validation. The API server will now verify that the certificate Common Name matches system:node:<nodename> when connecting to kubelet, preventing man-in-the-middle attacks via IP spoofing. This is opt-in (feature gate KubeletCertCNValidation) because it will break clusters using custom certificates that don’t follow the naming convention.

Dynamic Resource Allocation Matures

DRA device binding conditions graduate to beta, solving the problem of scheduling pods to nodes before disaggregated hardware is ready. GPUs attached via fabric, FPGAs requiring initialization, or any device with prep time can now signal readiness through binding conditions. The scheduler waits for the “ready” signal before committing the placement, or aborts and reschedules if preparation fails or times out.

This pairs with four other DRA improvements: partitionable devices (alpha), prioritized device requests (beta), device taints and tolerations (alpha), and consumable capacity (alpha). Together, they make Kubernetes viable for demanding AI infrastructure where hardware topology and availability windows matter.

Why This Release Signals a Shift

Kubernetes 1.35 represents the project’s acknowledgment that the workload landscape has fundamentally changed. The in-place resize graduation isn’t just a convenience feature—it’s recognition that “cattle not pets” doesn’t apply to stateful databases and ML model servers that can’t tolerate restart-based scaling. Gang scheduling’s arrival signals that Kubernetes is no longer just an API server for microservices; it’s becoming the substrate for training clusters and HPC workloads.

The cgroup v1 removal is equally significant: it’s the clearest signal yet that Kubernetes will break backward compatibility when the ecosystem has moved on. If systemd is dropping v1 and cloud providers have migrated, Kubernetes won’t maintain legacy code for the stragglers. This sets precedent for future deprecations—the project is willing to force upgrades when technical debt outweighs compatibility benefits.

The security enhancements tell a different story. Pod certificates, constrained impersonation, and stricter certificate validation are all responses to multi-tenancy challenges that emerged at scale. Zero-trust networking, workload identity, and defense-in-depth aren’t afterthoughts anymore—they’re first-class concerns that shape API design. The structured authentication config reaching stable (after two years in beta) proves the project can deliver security features that operators actually deploy.

What to Do Before December 17th

Immediate Action Items

Verify cgroup version on all nodes. Run stat -fc %T /sys/fs/cgroup/ on each node. If it returns tmpfs, you’re on cgroup v1 and kubelet won’t start in 1.35 without the opt-out flag. If it returns cgroup2fs, you’re clear. For mixed environments, set NodeRemoveCgroupV1Support=false in kubelet config until you complete the migration.

Upgrade containerd to 2.0+. Kubernetes 1.35 is the last release supporting containerd 1.x (aligned with containerd 1.7 EOL). The next version will require 2.0 or later. Check your runtime version with crictl version and plan the upgrade now—waiting until 1.36 forces a rushed migration during cluster updates.

Audit custom kubelet certificates. If you’re using custom serving certificates for kubelet (not the default auto-generated ones), ensure the Common Name follows the system:node:<nodename> format. When you enable KubeletCertCNValidation in future releases, non-compliant certificates will cause API server connection failures for exec, logs, and port-forward.

Features to Test in Non-Production

StatefulSet maxUnavailable. If you’re running StatefulSets with many replicas, test the new maxUnavailable field for RollingUpdate strategy. This allows parallel pod updates instead of strict one-by-one, dramatically speeding rollouts for workloads that can tolerate multiple replicas being down simultaneously.

In-place Pod resize on StatefulSets. Now that it’s GA, you can patch resources.requests and resources.limits on running pods without recreating them. Test with non-critical workloads first to understand how your container runtime handles dynamic resource adjustment.

Gang scheduling for AI workloads. If you’re running distributed training, evaluate the new Workload API for gang scheduling. Create a Workload with minCount set to your required quorum, then reference it from your Job or custom controller. The scheduler won’t start any pods until the full group can be placed.

Migration Timeline

For most clusters, the path forward is straightforward: ensure cgroup v2, upgrade containerd, and monitor for deprecation warnings. The breaking changes are telegraphed clearly, and the feature gates provide escape hatches for organizations that need more time. But the clock is ticking on cgroup v1—with complete removal in 1.38 (likely mid-2026), this is the last comfortable upgrade cycle before forced migration.