Karpenter Reference: K8s Node Autoscaling, NodePool, Spot/On-Demand & Cost Optimization
Karpenter is a CNCF-graduated K8s node autoscaler built by AWS. Unlike Cluster Autoscaler, it directly provisions nodes from EC2 (or other cloud APIs) based on pending pod requirements — choosing the right instance type, size, and spot/on-demand mix automatically. Faster and more cost-efficient than Cluster Autoscaler.
1. Karpenter vs Cluster Autoscaler
When to use Karpenter
| Feature | Karpenter | Cluster Autoscaler |
|---|---|---|
| Node provisioning | Direct EC2 API — provisions in ~60s | Scales node groups — slower (~2-5min) |
| Instance selection | Flexible — chooses best fit from 400+ instance types | Fixed instance types per node group |
| Spot instances | Automatic spot → on-demand fallback | Separate spot node group |
| Bin-packing | Right-sizes nodes for actual pod requests | Uses fixed node size from group config |
| Consolidation | Moves pods and terminates underutilized nodes | Scales down empty nodes only |
| Cloud support | AWS (primary), Azure, GCP (via providers) | All major clouds |
# Karpenter is best for: AWS EKS workloads with variable scale, spot-heavy setups, # or when you want automatic right-sizing instead of managing node groups
2. Install & IAM Setup
Install Karpenter on EKS with the required IAM permissions
# Prerequisites: EKS cluster, eksctl or Terraform for cluster setup # Karpenter needs: # 1. IAM role for Karpenter controller (EC2/SQS/IAM permissions) # 2. IAM role for nodes Karpenter creates (KarpenterNodeRole) # 3. SQS queue for spot interruption handling # Recommended: use eksctl or the official Karpenter getting started guide # https://karpenter.sh/docs/getting-started/ # Install with Helm: export CLUSTER_NAME=my-eks-cluster export KARPENTER_VERSION=v0.37.0 helm repo add karpenter https://charts.karpenter.sh/ helm upgrade --install karpenter karpenter/karpenter --namespace kube-system --set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT:role/KarpenterControllerRole --set settings.clusterName=$CLUSTER_NAME --set settings.interruptionQueue=$CLUSTER_NAME # SQS queue for spot interruption kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
3. NodePool — Define Node Provisioning Rules
NodePool replaces Provisioner in Karpenter v0.32+
# NodePool: defines what kinds of nodes Karpenter can provision
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
metadata:
labels:
billing-team: platform
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default # references EC2NodeClass below
# Taints/tolerations (optional — for dedicated node pools):
taints:
- key: gpu
value: "true"
effect: NoSchedule
requirements:
- key: kubernetes.io/arch
operator: In
values: [amd64, arm64] # both x86 and ARM (Graviton = cheaper)
- key: karpenter.sh/capacity-type
operator: In
values: [spot, on-demand] # prefer spot, fall back to on-demand
- key: karpenter.k8s.aws/instance-category
operator: In
values: [c, m, r] # compute, memory, memory-optimized
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"] # only instance generations > 2
limits:
cpu: "1000" # max total CPU across all Karpenter nodes
memory: 4000Gi # max total memory
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized # consolidate underutilized nodes
consolidateAfter: 30s # wait 30s before consolidating
budgets:
- nodes: "10%" # max 10% of nodes disrupted at once
---
# EC2NodeClass: AWS-specific node configuration
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
amiFamily: AL2023 # Amazon Linux 2023
role: KarpenterNodeRole # IAM role for nodes
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: my-eks-cluster # subnets tagged for Karpenter
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: my-eks-cluster # security groups
blockDeviceMappings:
- deviceName: /dev/xvda
ebs:
volumeSize: 50Gi
volumeType: gp3
4. Pod Scheduling with Karpenter
Control which NodePool schedules your pods
# Request specific capacity type (spot vs on-demand):
spec:
nodeSelector:
karpenter.sh/capacity-type: spot # prefer spot
# Tolerate spot node taint (if NodePool has taint):
spec:
tolerations:
- key: spot
operator: Exists
effect: NoSchedule
# Require specific instance family:
spec:
nodeSelector:
karpenter.k8s.aws/instance-family: m6i # force m6i family
# GPU workload — request specific instance type:
spec:
nodeSelector:
karpenter.k8s.aws/instance-type: g4dn.xlarge
# Node affinity for ARM/Graviton (30% cheaper than x86):
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: kubernetes.io/arch
operator: In
values: [arm64] # prefer Graviton, fall back to x86
# Topology spread (spread pods across AZs):
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels: {app: my-app}
5. Debugging & Cost Optimisation
Check provisioning decisions, spot handling, and cost tips
# Check Karpenter logs (why did it provision / not provision): kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter --tail=50 # List nodes Karpenter manages: kubectl get nodes -l karpenter.sh/nodepool=default # List NodeClaims (Karpenter's record of nodes it created): kubectl get nodeclaims # Describe a NodeClaim for provisioning details: kubectl describe nodeclaim my-nodeclaim-xxx # Shows: instance type chosen, reason, capacity type, zone # Check pending pods that Karpenter should be scheduling: kubectl get pods --field-selector status.phase=Pending # Force consolidation (move pods off underutilized node): kubectl annotate node my-node karpenter.sh/do-not-disrupt=true # protect a node kubectl annotate node my-node karpenter.sh/do-not-disrupt- # remove protection # Cost tips: # 1. Allow arm64 + amd64 — Graviton 3 (m7g) is 20-40% cheaper with same perf # 2. Use spot for stateless workloads (handle spot interruption with PodDisruptionBudget) # 3. Set resource requests accurately — Karpenter right-sizes nodes based on them # 4. Use limits.cpu/memory in NodePool to prevent unbounded scaling # 5. consolidationPolicy: WhenEmptyOrUnderutilized is more aggressive than WhenEmpty
Track Karpenter, Kubernetes, and cloud autoscaling releases.
ReleaseRun monitors Kubernetes, Docker, and 13+ DevOps technologies.
Related: KEDA Reference | Kubernetes YAML Reference | External Secrets Operator Reference
🔍 Free tool: K8s YAML Security Linter — check your NodePool and K8s workload manifests for security misconfigurations alongside Karpenter config.
Founded
2023 in London, UK
Contact
hello@releaserun.com