Kubernetes v1.35.1 upgrade preview: stop cgroup v1 and cached-image surprises
I’ve watched a node pool reboot and never come back because kubelet refused to start. That failure mode stops being hypothetical when you still run cgroup v1 and you roll a new AMI.
Kubernetes v1.35.1 (scheduled for 2026-02-10) is a good forcing function. Use it to prove two things before you touch production. Your nodes run cgroup v2, and your private images do not “work” only because somebody else pulled them earlier.
What will bite you first: kubelet on cgroup v1
Here’s the blunt part. If any node still boots on cgroup v1, kubelet can fail to start by default, and the node flips NotReady. Pods do not limp along. They reschedule, they evict, and your autoscaler starts making confident mistakes.
The thing nobody mentions is how this shows up in real life. You do not upgrade Kubernetes and instantly notice. You rotate an AMI next week, the kernel update reboots a batch of nodes, and then you discover half your fleet cannot run kubelet anymore.
- What changed: Upstream work tracked in KEP-5573 and implemented in PR #134298 makes cgroup v1 effectively unsupported for kubelet startup by default. Treat this as an “action required” change, not a warning you can ignore.
- What to do right now: Inventory your nodes. If you cannot guarantee cgroup v2 everywhere, block node reboots and node image rotations until you can.
The quiet security fix: “IfNotPresent” no longer trusts the node cache
This one hurts in a different way. A pod without credentials should not get a private image just because the bytes already exist on disk. That old behavior turned node-local cache into accidental access control, and I do not trust “it was already on the node” as an authorization story.
Kubernetes v1.35 moved KubeletEnsureSecretPulledImages to Beta and enabled it by default. Kubelet now verifies credentials for cached images, or it forces a pull path instead of blindly using what’s local.
- What you will see: workloads that “worked” before can start failing on reschedule, because they never carried the right imagePullSecrets or credential provider config.
- Why this becomes an outage: if credentials go missing, kubelet drags your registry back into the hot path during a node drain. In a busy cluster, that turns into a thundering herd of pulls.
If your cluster relied on cached private images as a shortcut for multi-tenant access, you did not have a clever optimization. You had a security bug that happened to be convenient.
kube-proxy ipvs deprecation: warning noise with a purpose
🔔 Never Miss a Breaking Change
Get weekly release intelligence — breaking changes, security patches, and upgrade guides before they break your build.
✅ You're in! Check your inbox for confirmation.
So. The logs will start yelling before the code disappears. KEP-5495 deprecates ipvs mode in kube-proxy, and PR #134539 calls out explicit deprecation warnings.
Some folks ignore warnings in patch trains. I get it. I still think that’s a bad habit. If your on-call pages on warnings, fix your alerting first. Then use the warnings as inventory. You want a list of clusters still running ipvs, not a surprise removal later.
A canary upgrade flow I actually trust
Keep it boring. Boring survives.
Do not start by upgrading everything. Start by proving kubelet starts after a reboot, and your private image pulls behave the way you think they do. Then scale out.
- Confirm what you run: check current versions and node pool makeup with kubectl version –short and kubectl get nodes -o wide.
- Drain one canary node: cordon and drain a single node, then watch reschedules and image pulls. Use kubectl drain <node> –ignore-daemonsets –delete-emptydir-data –grace-period=30.
- Upgrade with your distro’s method: kubeadm, managed service tooling, or your own automation. Keep the exact commands in your runbook, not in someone’s head.
- Reboot on purpose: restart kubelet, and reboot the canary node if your usual maintenance includes reboots. A kubelet that starts only before reboot does not count.
Red flags to grep before you uncordon
This bit me when a “clean” upgrade passed, then the next reboot took the node out. Grep logs like you mean it.
- kubelet won’t start and logs mention cgroup v1: treat it as a fleet configuration failure, not a regression. Fix the node image or cgroup mode before you roll more nodes.
- Private images re-pull during reschedule: check whether pods actually carry the right imagePullSecrets and whether your credential provider works under KubeletEnsureSecretPulledImages.
- kube-proxy ipvs deprecation warning: inventory clusters and plan migration. Do not let it rot until removal becomes forced.
Other stuff in this release: dependency bumps, some image updates, the usual.
Keep Reading
- Kubernetes Upgrade Checklist: the runbook for minor version upgrades
- Kubernetes EOL policy explained for on-call humans
- Kubernetes 1.35 release: the stuff that can break your cluster
Frequently Asked Questions
- Does Kubernetes 1.35 break cgroup v1? It doesn’t break it outright, but the default changed. Kubelet now defaults to cgroup v2 mode. If your nodes still run cgroup v1 (common on older OS images, RHEL 7, Amazon Linux 2), kubelet will fail to start after the upgrade unless you explicitly set –cgroup-driver=cgroupfs. Check with stat -fc %T /sys/fs/cgroup — if it shows tmpfs, you’re on v1 and need the flag.
- What’s the cached-image security change in Kubernetes 1.35.1? Kubernetes 1.35.1 changes how IfNotPresent image pull policy works. Previously, if an image existed in the node cache, Kubernetes trusted it regardless of who put it there. Now, kubelet validates the image digest against the registry. This means a compromised image in the node cache can’t be silently served to pods — but it also means first pull after upgrade may be slower.
- How should I test the Kubernetes 1.35.1 upgrade? Start with a canary approach: upgrade one non-critical node, run your workloads for 24 hours, check kubelet logs for cgroup errors, verify image pulls work, and monitor pod scheduling. Key commands: kubectl get nodes (check version), journalctl -u kubelet (check for cgroup errors), and kubectl describe pod (check image pull events). Only proceed to production after canary runs clean.
- Should I skip Kubernetes 1.35.0 and go straight to 1.35.1? Yes. The .0 release had known issues with kubelet restarts on cgroup v1 nodes that the .1 patch addresses. Always wait for the first patch release (.1) before upgrading production clusters — this is standard practice and especially important for the 1.35 line given the cgroup default change.