Skip to content

Grafana is the standard for metrics visualisation. This covers panel types, variables, alerting, provisioning, and the patterns that make dashboards actually useful.

1. Panel Types & Queries

Time series, stat, table, heatmap — when to use each
Panel TypeUse forKey config
Time seriesMetrics over time (CPU, req/s, latency)Fill opacity, gradient mode, stacking
StatSingle current value (uptime, error count)Thresholds (green/yellow/red), sparkline
GaugeValue as percentage of rangeMin/max bounds, thresholds
TableMulti-column data, last-value per seriesField overrides, column filtering
HeatmapLatency distribution over time (histogram buckets)Color scale, bucket boundaries
Bar chartComparison across categoriesGrouping, legend display mode
LogsLog lines from Loki/ElasticsearchLabel filters, log level colours
State timelineStatus/state changes over timeThresholds, value mapping
# PromQL for common panels:

# Time series — request rate:
rate(http_requests_total{job="myapp"}[5m])

# Stat — current error rate:
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) * 100

# Heatmap — latency distribution (requires histogram metric):
sum(rate(http_request_duration_seconds_bucket{job="myapp"}[$__rate_interval])) by (le)
# Set "Format as" = Heatmap in query options

# Table — per-pod memory:
sum by (pod) (container_memory_working_set_bytes{namespace="production"})
# Set "Instant" query, Sort by value descending

2. Variables — Making Dashboards Dynamic

Template variables, chained selects, and repeat rows
# Dashboard Settings → Variables → Add variable

# Query variable (populate from metric labels):
Name: namespace
Type: Query
Query: label_values(kube_pod_info, namespace)
# Renders as dropdown: all namespace values from Prometheus

# Chained variables (namespace drives pod list):
Name: pod
Query: label_values(kube_pod_info{namespace="$namespace"}, pod)
# Refresh: On time range change / On dashboard load

# Use in queries:
rate(http_requests_total{namespace="$namespace", pod="$pod"}[5m])

# Multi-value variable (select multiple):
# Enable "Multi-value" + "Include All option"
# $namespace becomes regex: production|staging
# Use in queries with =~: namespace=~"$namespace"

# Interval variable (for dynamic rate windows):
Name: interval
Type: Interval
Values: 30s,1m,5m,15m,1h
# Use: rate(metric[$interval])
# Or use $__rate_interval — auto-adjusts to scrape interval × 4 (recommended)

# Repeat rows/panels:
# Panel → Repeat options → Repeat by variable: namespace
# Creates one panel per selected namespace value automatically

3. Alerting

Grafana unified alerting — rules, contact points, notification policies
# Grafana 9+ uses "Unified Alerting" (replaces legacy per-panel alerts)
# Alerting → Alert rules → Create alert rule

# Alert rule anatomy:
# 1. Query: PromQL/LogQL that returns a value
# 2. Condition: WHEN [query] IS ABOVE threshold FOR duration
# 3. Labels: environment=production, team=platform
# 4. Annotations: summary="{{ $labels.pod }} is down"

# Example rule (pod restart rate alert):
Query A: increase(kube_pod_container_status_restarts_total{namespace="production"}[15m])
Condition: WHEN A IS ABOVE 3 FOR 0m
Labels: severity=warning, team=platform
Summary: Pod {{ $labels.pod }} restarted {{ $value }} times in 15m

# Contact points (where alerts go):
# Alerting → Contact points → Add
# Supports: Slack, PagerDuty, OpsGenie, Teams, email, webhook

# Slack contact point:
URL: https://hooks.slack.com/services/xxx
Title: [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
Message: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}

# Notification policies (route labels to contact points):
# Default: all alerts → default contact point
# Override: severity=critical → PagerDuty, severity=warning → Slack

# Silence alerts during maintenance:
# Alerting → Silences → Add silence
# Label matchers: environment=production, scheduled_maintenance=true
# Start/end time
Grafana alerting evaluates every minute by default — don't set thresholds shorter than your scrape interval. If Prometheus scrapes every 15s, a 10s duration alert will always fire on the first evaluation.

4. Provisioning — Dashboards as Code

File-based provisioning for GitOps dashboard management
# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy
    isDefault: true
    jsonData:
      timeInterval: "15s"          # matches prometheus scrape_interval
      incrementalQuerying: true     # performance: only query new time range
      queryTimeout: "60s"

  - name: Loki
    type: loki
    url: http://loki:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: "trace_id=(\w+)"
          url: http://tempo:3200/trace/$${__value.raw}  # click trace_id to open in Tempo

# /etc/grafana/provisioning/dashboards/dashboards.yaml
apiVersion: 1
providers:
  - name: default
    folder: Kubernetes
    type: file
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true   # subdir = folder name in Grafana

# Place dashboard JSON files in /var/lib/grafana/dashboards/
# Export from Grafana: Dashboard → Share → Export → JSON
# Import community dashboards: grafana.com/grafana/dashboards
# Popular IDs: 6417 (K8s cluster), 1860 (node exporter), 13659 (Loki logs)

5. Useful PromQL Patterns for Grafana

K8s, latency percentiles, error rates — copy-ready queries
# CPU throttling ratio (critical K8s metric):
sum by (pod) (
  rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace"}[$__rate_interval])
) /
sum by (pod) (
  rate(container_cpu_cfs_periods_total{namespace="$namespace"}[$__rate_interval])
) * 100
# > 25% = CPU limit too low, causing application slowdowns

# Latency p50/p95/p99 (requires histogram metric):
histogram_quantile(0.99, sum by (le) (
  rate(http_request_duration_seconds_bucket{job="$app"}[$__rate_interval])
))

# Memory pressure (available vs used):
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /
node_memory_MemTotal_bytes * 100

# PVC nearly full (alert threshold: > 80%):
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) * 100 > 80

# HPA saturation (replicas at max):
kube_horizontalpodautoscaler_status_current_replicas ==
kube_horizontalpodautoscaler_spec_max_replicas

# Absent metric alert (service stopped sending metrics):
absent(up{job="myapp"} == 1)

6. Grafana CLI & Docker Setup

grafana-cli, Docker Compose config, admin password reset
# Docker Compose:
services:
  grafana:
    image: grafana/grafana-oss:latest
    ports: ["3000:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_ADMIN_PASSWORD}"
      GF_INSTALL_PLUGINS: "grafana-piechart-panel,grafana-worldmap-panel"
      GF_FEATURE_TOGGLES_ENABLE: "publicDashboards"
      GF_AUTH_ANONYMOUS_ENABLED: "false"
      GF_SMTP_ENABLED: "true"
      GF_SMTP_HOST: "smtp.gmail.com:587"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./provisioning:/etc/grafana/provisioning:ro

# grafana-cli commands:
grafana-cli plugins install grafana-piechart-panel   # install plugin
grafana-cli plugins list-remote                       # available plugins
grafana-cli plugins update-all                        # update all plugins
grafana-cli admin reset-admin-password newpassword    # reset admin (on server)

# Useful API endpoints:
GET  /api/health                                      # readiness check
GET  /api/dashboards/home                             # home dashboard
POST /api/dashboards/db   (body: dashboard JSON)      # create/update dashboard
GET  /api/datasources                                 # list data sources
GET  /api/folders                                     # list folders
POST /api/annotations                                 # create annotation (mark deployments)

Track Grafana, Prometheus, and monitoring tool releases. ReleaseRun monitors Kubernetes, Docker, Python, Go, and 13+ technologies.

Related: Prometheus & PromQL Reference | OpenTelemetry Reference | Kubernetes Reference

🔍 Free tool: HTTP Security Headers Analyzer — check that your Grafana instance returns the right HTTP security headers — HSTS, CSP, X-Frame-Options.