etcd Reference: K8s Backing Store — Backup, Restore, etcdctl Ops & Cert Management
etcd is the distributed key-value store that backs every Kubernetes cluster. All K8s state — pods, deployments, secrets, configmaps — lives in etcd. Understanding etcd operations is essential for cluster admins: backup/restore is the most critical disaster recovery skill you can have.
1. etcdctl Essentials
Connect to etcd and run basic operations
# etcdctl version 3 API (set env var to avoid passing --endpoints on every command): export ETCDCTL_API=3 export ETCDCTL_ENDPOINTS=https://127.0.0.1:2379 export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key # These cert paths are standard for kubeadm clusters. # For managed K8s (EKS/GKE/AKS), you typically can't access etcd directly. # Cluster health: etcdctl endpoint health # checks all endpoints etcdctl endpoint status # shows leader, Raft index, DB size etcdctl member list # all cluster members # Read/write: etcdctl get /registry/namespaces/production # raw K8s resource etcdctl get /registry/secrets/default/ --prefix # all secrets in default namespace etcdctl get "" --prefix --keys-only | head -20 # list all K8s keys (NOISY!) # Watch for changes (useful for debugging): etcdctl watch /registry/configmaps/production/ --prefix # Compaction (reduce DB size by removing old revisions): CURRENT_REV=$(etcdctl endpoint status --write-out=json | python3 -c "import sys,json; print(json.load(sys.stdin)[0]['Status']['header']['revision'])") etcdctl compact $CURRENT_REV etcdctl defrag # reclaim disk space after compact
2. Backup — Snapshot Save
Create etcd snapshots for disaster recovery
# Create a snapshot (CRITICAL — do this before ANY cluster changes): etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db # Verify the snapshot: etcdctl snapshot status /backup/etcd-snapshot-20260314-120000.db # Output: hash, revision, total keys, total size # Automated backup script (run as cron): cat > /usr/local/bin/etcd-backup.sh << 'SCRIPT' #!/bin/bash set -euo pipefail export ETCDCTL_API=3 BACKUP_DIR=/backup/etcd TIMESTAMP=$(date +%Y%m%d-%H%M%S) mkdir -p $BACKUP_DIR etcdctl snapshot save $BACKUP_DIR/snapshot-$TIMESTAMP.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key etcdctl snapshot status $BACKUP_DIR/snapshot-$TIMESTAMP.db # Keep last 7 days: find $BACKUP_DIR -name "snapshot-*.db" -mtime +7 -delete echo "Backup complete: $BACKUP_DIR/snapshot-$TIMESTAMP.db" SCRIPT chmod +x /usr/local/bin/etcd-backup.sh # Cron: backup every 6 hours: # 0 */6 * * * /usr/local/bin/etcd-backup.sh >> /var/log/etcd-backup.log 2>&1 # Upload to S3 (add to backup script): aws s3 cp $BACKUP_DIR/snapshot-$TIMESTAMP.db s3://my-cluster-backups/etcd/
3. Restore — Disaster Recovery
Restore a cluster from an etcd snapshot
# RESTORE PROCEDURE (single-node or multi-node cluster): # WARNING: This replaces ALL cluster state. Do this on a stopped cluster. # 1. Stop all K8s API server pods (kubeadm clusters): # Move static pod manifests out of /etc/kubernetes/manifests/: mkdir -p /tmp/k8s-manifests-backup mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/k8s-manifests-backup/ mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/k8s-manifests-backup/ mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/k8s-manifests-backup/ # 2. Stop etcd: mv /etc/kubernetes/manifests/etcd.yaml /tmp/k8s-manifests-backup/ # Wait for all pods to stop: crictl ps | grep -E "etcd|apiserver|controller|scheduler" # should be empty # 3. Restore the snapshot to a new data directory: etcdctl snapshot restore /backup/etcd-snapshot-20260314-120000.db --name=master --initial-cluster=master=https://10.0.0.1:2380 --initial-advertise-peer-urls=https://10.0.0.1:2380 --data-dir=/var/lib/etcd-restore # 4. Replace the etcd data directory: mv /var/lib/etcd /var/lib/etcd-old-$(date +%Y%m%d) mv /var/lib/etcd-restore /var/lib/etcd chown -R etcd:etcd /var/lib/etcd # if etcd runs as dedicated user # 5. Restore static pod manifests: mv /tmp/k8s-manifests-backup/*.yaml /etc/kubernetes/manifests/ # Watch pods restart: crictl ps -a | grep -E "etcd|apiserver" # 6. Verify cluster is up: kubectl get nodes kubectl get pods -A
4. Cluster Operations
Add/remove etcd members, check leader, debug performance
# Check cluster status: etcdctl endpoint status --write-out=table # Shows: endpoint, ID, version, DB size, is leader, is learner, Raft term, Raft index etcdctl member list --write-out=table # Shows: ID, status, name, peer URLs, client URLs, is learner # Add a new etcd member (3-node to 5-node expansion): # 1. Add the member to the cluster: etcdctl member add new-member-3 --peer-urls=https://10.0.0.4:2380 # 2. Start etcd on the new node with --initial-cluster-state=existing # 3. Verify: etcdctl member list # Remove a failed member: etcdctl member remove MEMBER_ID # get ID from etcdctl member list # Defragment (reclaim disk after compaction): etcdctl defrag # defrag leader last (triggers leader election) etcdctl defrag --cluster # defrag all members at once (caution!) # Check DB size (alert if > 4GB — default quota): etcdctl endpoint status --write-out=json | python3 -c "import sys,json; [print(e['Status']['dbSize']) for e in json.load(sys.stdin)]" # Increase quota (edit etcd pod manifest): # --quota-backend-bytes=8589934592 # 8GB
5. TLS & Security
Understand etcd TLS certs and access control
# etcd TLS cert paths (kubeadm): # /etc/kubernetes/pki/etcd/ # ca.crt — etcd CA cert (sign all etcd certs) # ca.key — etcd CA key (PROTECT THIS) # server.crt — etcd server cert # server.key — etcd server key # peer.crt — for etcd peer-to-peer communication # peer.key — peer key # healthcheck-client.crt/key — used by liveness probe # /etc/kubernetes/pki/ # apiserver-etcd-client.crt — API server uses this to connect to etcd # apiserver-etcd-client.key # Check cert expiry: openssl x509 -in /etc/kubernetes/pki/etcd/server.crt -noout -dates # Not After: shows expiration (kubeadm certs expire in 1 year by default) # Renew ALL K8s certs (kubeadm, before expiry): kubeadm certs renew all # Check which certs expire when: kubeadm certs check-expiration # etcd access is controlled by TLS client certs only — no username/password # Only clients with certs signed by etcd CA can connect # The K8s API server has apiserver-etcd-client.crt for this # Audit: who has access to etcd CA key? # If compromised: anyone can read ALL K8s secrets including service account tokens ls -la /etc/kubernetes/pki/etcd/ca.key # should be root-only (600)
Track etcd, Kubernetes, and cluster component releases.
ReleaseRun monitors Kubernetes, Docker, and 13+ DevOps technologies.
Related: Kubernetes YAML Reference | Kubernetes RBAC Reference | Velero Backup Reference
🔍 Free tool: K8s YAML Security Linter — check K8s workload manifests for security misconfigurations — etcd stores them all.
Founded
2023 in London, UK
Contact
hello@releaserun.com