Kubernetes

Debugging Kubernetes Nodes in NotReady State

Step-by-step guide to diagnosing and fixing Kubernetes nodes stuck in NotReady status — covering kubelet, containerd, disk pressure, networking, and cloud provider issues.

Matheus February 16, 2026 6 min read

A node stuck in `NotReady` is one of the most common — and most disruptive — Kubernetes issues. When a node goes NotReady, the control plane stops scheduling new pods to it and begins evicting existing workloads after a timeout.

Here’s how to diagnose the root cause and fix it.

What Does NotReady Mean?

Every Kubernetes node runs a `kubelet` process that periodically reports its status to the API server. When the API server stops receiving these heartbeats (default: every 10 seconds, timeout after 40 seconds), it marks the node as `NotReady`.

The `NotReady` status means: the control plane cannot confirm this node is healthy and available for work.

Check node status with:

kubectl get nodes

Output showing a problem:

NAME          STATUS     ROLES    AGE   VERSION

worker-01 Ready 45d v1.34.2

worker-02 NotReady 45d v1.34.2

worker-03 Ready 45d v1.34.2

Step 1: Check Node Conditions

Start with `kubectl describe node` to see what conditions are reported:

kubectl describe node worker-02

Look at the `Conditions` section:

Conditions:

Type Status Reason

---- ------ ------

MemoryPressure False KubeletHasSufficientMemory

DiskPressure True KubeletHasDiskPressure

PIDPressure False KubeletHasSufficientPID

Ready False KubeletNotReady

Common condition flags:

  • DiskPressure: True — Node filesystem is running out of space. Kubernetes begins evicting pods when disk usage exceeds the eviction threshold (default: 85%).
  • MemoryPressure: True — RAM is exhausted. The kubelet starts killing pods based on their QoS class.
  • PIDPressure: True — The node is running out of process IDs. Usually caused by a pod fork-bombing or a leak in container processes.
  • Ready: False — Generic “kubelet is unhealthy” — dig deeper into kubelet logs.

Step 2: Check Kubelet Logs

🔔 Never Miss a Breaking Change

Monthly release roundup — breaking changes, security patches, and upgrade guides across your stack.

✅ You're in! Check your inbox for confirmation.

The kubelet is the agent that maintains node health. If it’s crashing or misconfigured, the node goes NotReady.

# SSH into the node

ssh worker-02

# Check kubelet status

systemctl status kubelet

# View recent logs

journalctl -u kubelet --since "10 minutes ago" --no-pager

Common kubelet issues:

Symptom Likely Cause Fix
`kubelet` service stopped Process crash or OOM kill `systemctl restart kubelet`
Certificate expired TLS cert rotation failed Renew certs: `kubeadm certs renew all`
“Failed to connect to apiserver” Network issue or API server down Check network, firewall rules, API server health
“PLEG is not healthy” Container runtime issue Restart containerd: `systemctl restart containerd`
“node not found” Node was deleted from cluster Re-join: `kubeadm join …`

Step 3: Check Container Runtime

Kubernetes relies on a container runtime (containerd or CRI-O). If the runtime is unhealthy, the kubelet can’t manage pods:

# Check containerd status

systemctl status containerd

# Check runtime endpoint

crictl info

# List containers (should show running system containers)

crictl ps

If containerd is unresponsive:

systemctl restart containerd

systemctl restart kubelet

Step 4: Check Resource Exhaustion

The most common cause of NotReady nodes is resource exhaustion:

Disk Space

df -h /

df -h /var/lib/kubelet

df -h /var/lib/containerd

Fix: Clean up unused container images and stopped containers:

crictl rmi --prune

# Or if using Docker:

docker system prune -af

Memory

free -h

# Check top memory consumers

ps aux --sort=-%mem | head -20

Fix: Identify pods consuming excessive memory. Check if resource limits are set:

kubectl top pods --all-namespaces --sort-by=memory | head -20

Process IDs

# Check current PID count vs limit

cat /proc/sys/kernel/pid_max

ls /proc | grep -c '^[0-9]'

Step 5: Check Networking

If the node can’t reach the API server, it goes NotReady even if it’s otherwise healthy:

# Test API server connectivity

curl -k https://:6443/healthz

# Check DNS resolution

nslookup kubernetes.default.svc.cluster.local

# Verify network plugin (CNI) is running

crictl ps | grep -E "calico|flannel|cilium|weave"

CNI plugin crashed? This is surprisingly common. If your network plugin pod (Calico, Flannel, Cilium) isn’t running, all networking fails:

kubectl get pods -n kube-system | grep -E "calico|flannel|cilium"

Step 6: Check Cloud Provider Issues

On managed Kubernetes (EKS, GKE, AKS), NotReady nodes can also mean:

  • Instance was terminated by auto-scaler or spot instance reclamation
  • Instance health check failed at the cloud provider level
  • Network ACL or security group blocking kubelet-to-API-server traffic

Check your cloud provider’s instance status:

# AWS EKS

aws ec2 describe-instance-status --instance-ids

# GKE - check node pool status

gcloud container node-pools describe --cluster

# AKS

az aks nodepool show --name --cluster-name -g

Prevention: Stop NotReady Before It Happens

  • Set resource requests and limits on all pods. Without limits, a single pod can consume all memory and crash the node.
  • Enable node auto-repair on managed services (GKE, AKS support this natively; EKS via node health checks).
  • Monitor disk usage and set up alerts at 70% capacity — well before the 85% eviction threshold.
  • Use Pod Disruption Budgets (PDBs) to control how many pods can be evicted simultaneously.
  • Keep Kubernetes versions current. Older versions have known kubelet bugs that can cause spurious NotReady events. Check your version’s health status on ReleaseRun.
  • Quick Troubleshooting Flowchart

  • `kubectl describe node ` — Check conditions
  • Is DiskPressure/MemoryPressure true? → Clean resources
  • Is kubelet running? → `systemctl status kubelet` → restart if needed
  • Is containerd running? → `systemctl status containerd` → restart if needed
  • Can node reach API server? → Check network/firewall
  • Is CNI plugin running? → Check kube-system pods
  • Still stuck? → Check `journalctl -u kubelet` for specific errors
  • Track Your Kubernetes Version Health

    Running an older Kubernetes version increases the risk of kubelet bugs that cause NotReady events. Kubernetes 1.32 reaches end of life on February 28, 2026 — if you’re still running it, check our migration playbook.

    Monitor every version’s health, CVE status, and EOL dates at ReleaseRun’s Kubernetes hub.


    Maintained by ReleaseRun — tracking release health for 300+ software products. Last updated: February 2026.

    Related Reading

    🛠️ Try These Free Tools

    ⚠️ K8s Manifest Deprecation Checker

    Paste your Kubernetes YAML to detect deprecated APIs before upgrading.

    🐳 Dockerfile Security Linter

    Paste a Dockerfile for instant security and best-practice analysis.

    📦 Dependency EOL Scanner

    Paste your dependency file to check for end-of-life packages.

    See all free tools →