A node stuck in `NotReady` is one of the most common — and most disruptive — Kubernetes issues. When a node goes NotReady, the control plane stops scheduling new pods to it and begins evicting existing workloads after a timeout.
Here’s how to diagnose the root cause and fix it.
What Does NotReady Mean?
Every Kubernetes node runs a `kubelet` process that periodically reports its status to the API server. When the API server stops receiving these heartbeats (default: every 10 seconds, timeout after 40 seconds), it marks the node as `NotReady`.
The `NotReady` status means: the control plane cannot confirm this node is healthy and available for work.
Check node status with:
kubectl get nodes
Output showing a problem:
NAME STATUS ROLES AGE VERSION
worker-01 Ready 45d v1.34.2
worker-02 NotReady 45d v1.34.2
worker-03 Ready 45d v1.34.2
Step 1: Check Node Conditions
Start with `kubectl describe node` to see what conditions are reported:
kubectl describe node worker-02
Look at the `Conditions` section:
Conditions:
Type Status Reason
---- ------ ------
MemoryPressure False KubeletHasSufficientMemory
DiskPressure True KubeletHasDiskPressure
PIDPressure False KubeletHasSufficientPID
Ready False KubeletNotReady
Common condition flags:
- DiskPressure: True — Node filesystem is running out of space. Kubernetes begins evicting pods when disk usage exceeds the eviction threshold (default: 85%).
- MemoryPressure: True — RAM is exhausted. The kubelet starts killing pods based on their QoS class.
- PIDPressure: True — The node is running out of process IDs. Usually caused by a pod fork-bombing or a leak in container processes.
- Ready: False — Generic “kubelet is unhealthy” — dig deeper into kubelet logs.
Step 2: Check Kubelet Logs
🔔 Never Miss a Breaking Change
Monthly release roundup — breaking changes, security patches, and upgrade guides across your stack.
✅ You're in! Check your inbox for confirmation.
The kubelet is the agent that maintains node health. If it’s crashing or misconfigured, the node goes NotReady.
# SSH into the node
ssh worker-02
# Check kubelet status
systemctl status kubelet
# View recent logs
journalctl -u kubelet --since "10 minutes ago" --no-pager
Common kubelet issues:
| Symptom | Likely Cause | Fix |
|---|---|---|
| `kubelet` service stopped | Process crash or OOM kill | `systemctl restart kubelet` |
| Certificate expired | TLS cert rotation failed | Renew certs: `kubeadm certs renew all` |
| “Failed to connect to apiserver” | Network issue or API server down | Check network, firewall rules, API server health |
| “PLEG is not healthy” | Container runtime issue | Restart containerd: `systemctl restart containerd` |
| “node not found” | Node was deleted from cluster | Re-join: `kubeadm join …` |
Step 3: Check Container Runtime
Kubernetes relies on a container runtime (containerd or CRI-O). If the runtime is unhealthy, the kubelet can’t manage pods:
# Check containerd status
systemctl status containerd
# Check runtime endpoint
crictl info
# List containers (should show running system containers)
crictl ps
If containerd is unresponsive:
systemctl restart containerd
systemctl restart kubelet
Step 4: Check Resource Exhaustion
The most common cause of NotReady nodes is resource exhaustion:
Disk Space
df -h /
df -h /var/lib/kubelet
df -h /var/lib/containerd
Fix: Clean up unused container images and stopped containers:
crictl rmi --prune
# Or if using Docker:
docker system prune -af
Memory
free -h
# Check top memory consumers
ps aux --sort=-%mem | head -20
Fix: Identify pods consuming excessive memory. Check if resource limits are set:
kubectl top pods --all-namespaces --sort-by=memory | head -20
Process IDs
# Check current PID count vs limit
cat /proc/sys/kernel/pid_max
ls /proc | grep -c '^[0-9]'
Step 5: Check Networking
If the node can’t reach the API server, it goes NotReady even if it’s otherwise healthy:
# Test API server connectivity
curl -k https://:6443/healthz
# Check DNS resolution
nslookup kubernetes.default.svc.cluster.local
# Verify network plugin (CNI) is running
crictl ps | grep -E "calico|flannel|cilium|weave"
CNI plugin crashed? This is surprisingly common. If your network plugin pod (Calico, Flannel, Cilium) isn’t running, all networking fails:
kubectl get pods -n kube-system | grep -E "calico|flannel|cilium"
Step 6: Check Cloud Provider Issues
On managed Kubernetes (EKS, GKE, AKS), NotReady nodes can also mean:
- Instance was terminated by auto-scaler or spot instance reclamation
- Instance health check failed at the cloud provider level
- Network ACL or security group blocking kubelet-to-API-server traffic
Check your cloud provider’s instance status:
# AWS EKS
aws ec2 describe-instance-status --instance-ids
# GKE - check node pool status
gcloud container node-pools describe --cluster
# AKS
az aks nodepool show --name --cluster-name -g
Prevention: Stop NotReady Before It Happens
Quick Troubleshooting Flowchart
Track Your Kubernetes Version Health
Running an older Kubernetes version increases the risk of kubelet bugs that cause NotReady events. Kubernetes 1.32 reaches end of life on February 28, 2026 — if you’re still running it, check our migration playbook.
Monitor every version’s health, CVE status, and EOL dates at ReleaseRun’s Kubernetes hub.
Maintained by ReleaseRun — tracking release health for 300+ software products. Last updated: February 2026.
Related Reading
- Kubernetes Upgrade Checklist — The runbook that prevents NotReady states during upgrades
- Kubernetes 1.32 End of Life: Migration Playbook — Deadline: February 28, 2026
- Kubernetes EOL Policy Explained — Know when your version loses support
- Kubelet Restarts in K8s 1.35.1 — Related node stability issue to test
- Popular Kubernetes Distributions Compared — Different distributions handle node issues differently
🛠️ Try These Free Tools
Paste your Kubernetes YAML to detect deprecated APIs before upgrading.
Paste a Dockerfile for instant security and best-practice analysis.
Paste your dependency file to check for end-of-life packages.