Knative Reference: Serverless K8s — Autoscale-to-Zero, Eventing, Traffic Splitting & GitOps

Knative is a CNCF-graduated Kubernetes add-on for serverless workloads. Serving auto-scales containers to zero based on HTTP traffic. Eventing wires event sources (Kafka, Pub/Sub, HTTP) to processing functions. Used internally by Cloud Run, Red Hat OpenShift Serverless, and VMware Tanzu.

1. Install & Architecture

Install Knative Serving + Eventing on Kubernetes

Component	What it does
Knative Serving	HTTP-triggered autoscale-to-zero services
Knative Eventing	Event-driven — connect sources to services/brokers
Kourier / Istio / Contour	Networking layer (pick one ingress)
Knative Operator	Lifecycle management for Knative itself

# Install Knative Serving CRDs + core:
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.0/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.14.0/serving-core.yaml

# Install Kourier (lightweight ingress — no full Istio required):
kubectl apply -f https://github.com/knative/net-kourier/releases/download/knative-v1.14.0/kourier.yaml
kubectl patch configmap/config-network   --namespace knative-serving   --type merge   --patch '{"data":{"ingress-class":"kourier.ingress.networking.knative.dev"}}'

# Install Knative Eventing:
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.14.0/eventing-crds.yaml
kubectl apply -f https://github.com/knative/eventing/releases/download/knative-v1.14.0/eventing-core.yaml

# Install kn CLI (convenience tool):
brew install knative/client/kn

2. Knative Services — Autoscale-to-Zero

Deploy a Service and configure autoscaling + traffic splitting

# Deploy a Knative Service (simple):
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: hello
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: "0"     # scale to zero when idle
        autoscaling.knative.dev/maxScale: "10"    # max 10 replicas
        autoscaling.knative.dev/target: "100"     # target: 100 concurrent requests/pod
    spec:
      containers:
        - image: gcr.io/knative-samples/helloworld-go
          env:
            - name: TARGET
              value: "World"
          resources:
            requests: {cpu: 100m, memory: 64Mi}
            limits: {cpu: 500m, memory: 256Mi}

# Deploy via kn CLI (faster for dev):
kn service create hello   --image gcr.io/knative-samples/helloworld-go   --env TARGET=World   --scale-min 0 --scale-max 10

# Traffic splitting (canary / blue-green):
kn service update hello   --tag @latest=candidate   --traffic candidate=20,@latest=80    # 20% to latest, 80% to stable

# URL for the service:
kn service describe hello    # shows URL, traffic percentages, revisions

# Scale-to-zero cold start: ~1-3s typically (depends on image pull policy)
# Keep warm: set minScale=1 to avoid cold starts in SLA-sensitive paths

3. Knative Eventing — Event-Driven Architecture

Brokers, triggers, sources, and channel subscriptions

# Eventing core concepts:
# - Source: produces CloudEvents (PingSource, ApiServerSource, KafkaSource, etc.)
# - Broker: receives + routes events to Triggers
# - Trigger: filters events from a Broker → sends to a Subscriber (Knative Service)
# - Channel: ordered delivery via Kafka, InMemory, or Redis

# Create a Broker:
kubectl apply -f - <

4. Custom Autoscaling & Concurrency

KPA vs HPA, concurrency targets, scale-to-zero grace period

# Knative uses KPA (Knative Pod Autoscaler) by default
# KPA reacts in seconds; HPA reacts in minutes (based on metrics)
# Use HPA for CPU-based scaling (no request concurrency tracking)

# Concurrency model — KEY CONCEPT:
# target: requests per pod at which scaling UP kicks in
# containerConcurrency: max concurrent requests a single container handles
#   0 = unlimited (default)
#   1 = process requests one at a time (like Lambda)
# hard limit: containerConcurrency (rejects if exceeded)

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        # Scale-to-zero settings:
        autoscaling.knative.dev/minScale: "0"
        autoscaling.knative.dev/maxScale: "50"
        autoscaling.knative.dev/target: "10"          # scale up if >10 concurrent/pod
        autoscaling.knative.dev/scale-to-zero-grace-period: "60s"  # wait 60s before killing
        autoscaling.knative.dev/scale-to-zero-pod-retention-period: "0s"
        # Use HPA instead (CPU-based):
        # autoscaling.knative.dev/class: hpa.autoscaling.knative.dev
        # autoscaling.knative.dev/metric: cpu
        # autoscaling.knative.dev/target: "70"   # 70% CPU utilization
    spec:
      containerConcurrency: 0    # allow unlimited concurrent requests per container

# Scale-from-zero latency optimization:
# - Use smaller, pre-pulled images (ImagePullPolicy: IfNotPresent)
# - Set initialScale: "1" annotation to pre-warm one replica at startup
# - Use minScale: 1 for latency-sensitive services

5. Revisions, Traffic & GitOps

Immutable revisions, canary releases, rollback, and declarative GitOps

# Every update to a Knative Service creates a new Revision (immutable snapshot)
# Traffic can be split across Revisions — useful for canary and rollback

# List revisions:
kn revisions list
# Output: NAME              SERVICE  TRAFFIC  TAGS  GENERATION  AGE  CONDITIONS
#         hello-00004       hello    80%            4           5m   3 OK / 3
#         hello-00003       hello    20%     v3     3           2h   3 OK / 3

# Gradual canary rollout:
kn service update hello --traffic hello-00004=80,hello-00003=20

# Instant rollback (traffic to previous stable revision):
kn service update hello --traffic hello-00003=100

# Tag a revision as stable (human-readable URL):
kn service update hello --tag hello-00003=stable --tag hello-00004=canary
# Creates: stable-hello.default.example.com and canary-hello.default.example.com

# GitOps with Argo CD:
# Store Service YAML in Git → ArgoCD syncs on commit
# Use Argo Rollouts + Knative for progressive delivery:
# - Argo Rollouts watches Knative traffic weights
# - Automatically shifts traffic based on metrics (latency, error rate)

# Knative Service status check:
kubectl get ksvc hello -o yaml | grep -A5 status
# Ready=True means at least one Revision is healthy and traffic is serving

Track Knative, Kubernetes, and serverless platform releases. ReleaseRun monitors Kubernetes, Docker, and 13+ DevOps technologies.

Related: Kubernetes YAML Reference | Argo Rollouts Reference | KEDA Reference

🔍 Free tool: K8s YAML Security Linter — check Knative Service and Eventing manifests for K8s security misconfigurations.