Should I test Kubernetes 1.35 beta in staging?

Yes, but only if you run GPU/accelerator workloads or have custom scheduling. The biggest changes in 1.35 are Dynamic Resource Allocation (DRA) improvements and gang scheduling — both are alpha features that mostly affect ML/AI clusters. If you run standard web workloads, wait for the RC or GA relea

What breaks in Kubernetes 1.35 beta that I should watch for?

Three things: (1) cgroup v1 default changed — kubelet now expects cgroup v2, (2) the --pod-infra-container-image flag is removed entirely, and (3) several feature gates that were beta-default-on are now GA-locked-on, meaning you can't disable them. Check your kubelet flags and feature gate overrides

How long should I wait between Kubernetes beta and production upgrade?

Standard practice: beta for exploratory testing only (never production). Wait for the .0 GA release (usually 4-6 weeks after beta), then wait for .1 patch (usually 2-4 weeks after GA). For 1.35 specifically, the cgroup v1 change makes the .1 patch especially important — early GA releases often have

What's the difference between Kubernetes RC and beta releases?

Beta means features are functionally complete but may still change. RC (Release Candidate) means the release is feature-frozen and only bug fixes remain — it's essentially the GA release unless critical issues surface. Test beta for feature validation, RC for upgrade validation, and only deploy GA t

📢 Update: Kubernetes 1.35 is now GA. This beta preview is preserved for reference. For the full release analysis, read Kubernetes 1.35: the stuff that can break your cluster and our Kubernetes Upgrade Checklist.

Kubernetes v1.35.0-beta.0: what I’d test, and what I’d ignore

I don’t trust “beta” builds in a real cluster. I’ve watched teams treat one like a patch release, then spend a week chasing scheduler weirdness that only shows up under load.

Kubernetes v1.35.0-beta.0 (tagged Nov 19, 2025) puts a spotlight on scheduling and device allocation, especially the Workload API and early gang scheduling work. Use it to learn. Do not use it to prove you’re brave.

My take: v1.35 beta matters if scheduling already hurts you

If your platform runs mostly stateless web stuff, you probably won’t feel the big wins here. If you run batch, GPU training, or “N pods must start together or the job fails,” this beta can save you from the classic half-scheduled mess where 3 workers start, 5 workers sit Pending, and everything times out.

Test it now if you run multi-pod jobs: Gang-style placement and workload-level constraints can reduce partial rollouts, but you will pay in scheduler complexity.
Test it later if you only care about defaults: Several features flip to beta or default-on behavior. Those changes can still break you, but you can validate them without touching alpha scheduling APIs.
Ignore commit counts: “21 commits since alpha” tells you nothing about risk. I’ve seen one-line changes brick an admission chain.

The new stuff that will actually change your day

Here’s the thing. The Workload API and gang scheduling concepts pull scheduling up a level, from “each Pod fends for itself” to “schedule this whole set as a unit.”

I’ve seen teams fake this with custom controllers, hacked Pod priorities, and a pile of retries. It usually works, until the cluster gets tight and the scheduler starts making “reasonable” choices that destroy your job.

Workload API (alpha): The draft calls out a new scheduling.k8s.io/v1alpha1 API for expressing workload-level requirements so the scheduler can reason about a group. Verify the exact schema in the official changelog before you write CRDs into anything important.
Gang Scheduling plugin (alpha): “All-or-nothing” scheduling sounds simple. In practice, it interacts with autoscaling, PodDisruptionBudgets, and quota in ways that can look like deadlock if you do not instrument it.
Node Declared Features (alpha): This aims to publish node capabilities without hand-managed labels. That can help hardware fleets, but you need to confirm the exact field name and matching behavior, otherwise your pods will never land.

DRA changes: great for accelerator clusters, risky for everyone else

🔔 Never Miss a Breaking Change

Get weekly release intelligence — breaking changes, security patches, and upgrade guides before they break your build.

✅ You're in! Check your inbox for confirmation.

This bit me when a staging cluster had just enough devices to satisfy most claims, but not enough to satisfy all the initContainers. Everything looked “almost fine” until a deploy window ended.

The beta notes highlight Dynamic Resource Allocation improvements: quota accounting for device classes, device taints rules behind a separate gate, health monitoring timeouts, and scheduler scoring for DRA-backed extended resources. If you operate DRA drivers, test these with your worst-case jobs, not your happy-path demo.

Partitionable Devices breaking change: The draft warns about backwards incompatible ResourceSlice changes and says you must clean up before upgrading. Add an explicit cleanup runbook in your internal docs, because “remove existing ResourceSlices” is not a plan.
New DRA metrics: Watch claim creation and kubelet image manager metrics during your tests. Don’t just scrape them. Put a dashboard next to the scheduler logs and look for spikes when jobs queue.

Defaults and “quiet” promotions that can still break staging

Small changes cause the loudest pages. A default-on admission plugin or stricter parsing rule can break a deploy pipeline faster than a new API ever will.

According to the draft, v1.35.0-beta.0 includes a set of graduations and default behavior changes like EnvFiles moving to beta and enabling by default, PodTopologyLabelsAdmission enabling by default, and Image Volume Source enabling by default. Verify each one in CHANGELOG-1.35.md, then test them with your real manifests.

EnvFiles parsing change: The post claims values must use single quotes with a restricted POSIX shell subset. Search your repos for env files that use double quotes or unquoted values, then run a dry-run apply in a throwaway cluster.
PodTopologyLabelsAdmission default: Auto-adding zone and region labels sounds harmless, but I’ve seen teams rely on “missing label” behavior in policy code. Check your admission policies and selectors.
Image garbage collection knobs: Stable features like ImageGCMaximumAge still need tuning. A too-aggressive setting makes nodes churn images and turns cold-starts into molasses.

Upgrade rule for this beta: if you cannot reproduce your cluster in staging, do not test alpha scheduling features. You will not debug it fast enough when it surprises you.

How I’d test v1.35.0-beta.0 in two hours

Go fast. But not reckless.

I’d spin an isolated test cluster, deploy one representative workload per category, then watch events, scheduler logs, and node pressure for an hour. Some folks skip canaries for betas. I don’t, but I get it if you have a tiny dev cluster and you just want to poke at the API.

Step 1, snapshot what matters: Take an etcd snapshot and export cluster configs. Don’t argue with this step. Just do it.
Step 2, validate breaking-change landmines: If you used Partitionable Devices, inventory and clean ResourceSlices before you upgrade.
Step 3, run a scheduling stress test: Create a small batch workload that needs parallel placement. Then intentionally constrain nodes so the scheduler has to make hard choices.
Step 4, turn on only one alpha at a time: Enable Workload API and gang scheduling separately. Otherwise you won’t know what caused the pain.
Step 5, collect proof: Save scheduler logs, a full event dump, and a metrics snapshot. Future-you will thank you.

Examples you can start from (but verify the schema first)

These manifests show intent, not guaranteed final API shape. Alpha means the API can change, and it probably will.

Workload API + gang-style policy (draft): Create a Workload with a pod set count, then apply a gang scheduling policy so the scheduler either places all workers or places none.
Numeric tolerations (Gt/Lt): If core/v1 tolerations really accept numeric operators in this build, you can express “only schedule on nodes with gpu-memory-gb greater than 40.” Validate it with a tiny pod and a single taint first.
In-place resource resize: Patch a deployment’s requests in staging and confirm the pod does not restart. Then check PodStatus to see what the kubelet reports.

Breaking changes and known issues I’d put on a sticky note

Print this. Seriously.

Partitionable Devices: Plan cleanup before upgrade. Plan rollback before cleanup, too.
EnvFiles quoting rules: Expect failures in older env file generators.
kubectl deprecations: If kubectl really dropped networking/v1beta1 Ingress support here, your old scripts will fail at the worst possible time.
Alpha feature volatility: Workload API, gang scheduling, and node declared features can change incompatibly in the next pre-release.

Other stuff in this release: dependency bumps, some image updates, the usual.

Official links (do not skip these)

Read the upstream release tag notes and then read CHANGELOG-1.35.md. I know, it’s long. It’s still faster than guessing.

Also, fix the Go version inconsistency in your internal notes before you brief anyone. That one detail will trigger a pointless Slack thread.

Frequently Asked Questions

Should I test Kubernetes 1.35 beta in staging? Yes, but only if you run GPU/accelerator workloads or have custom scheduling. The biggest changes in 1.35 are Dynamic Resource Allocation (DRA) improvements and gang scheduling — both are alpha features that mostly affect ML/AI clusters. If you run standard web workloads, wait for the RC or GA release. If scheduling already hurts you, the beta is worth testing in an isolated cluster.
What breaks in Kubernetes 1.35 beta that I should watch for? Three things: (1) cgroup v1 default changed — kubelet now expects cgroup v2, (2) the –pod-infra-container-image flag is removed entirely, and (3) several feature gates that were beta-default-on are now GA-locked-on, meaning you can’t disable them. Check your kubelet flags and feature gate overrides before upgrading.
How long should I wait between Kubernetes beta and production upgrade? Standard practice: beta for exploratory testing only (never production). Wait for the .0 GA release (usually 4-6 weeks after beta), then wait for .1 patch (usually 2-4 weeks after GA). For 1.35 specifically, the cgroup v1 change makes the .1 patch especially important — early GA releases often have edge cases with the default change.
What’s the difference between Kubernetes RC and beta releases? Beta means features are functionally complete but may still change. RC (Release Candidate) means the release is feature-frozen and only bug fixes remain — it’s essentially the GA release unless critical issues surface. Test beta for feature validation, RC for upgrade validation, and only deploy GA to production.