Grafana Reference: Dashboards, Variables, Alerting, PromQL Patterns & Provisioning

Grafana is the standard for metrics visualisation. This covers panel types, variables, alerting, provisioning, and the patterns that make dashboards actually useful.

1. Panel Types & Queries

Time series, stat, table, heatmap — when to use each

Panel Type	Use for	Key config
Time series	Metrics over time (CPU, req/s, latency)	Fill opacity, gradient mode, stacking
Stat	Single current value (uptime, error count)	Thresholds (green/yellow/red), sparkline
Gauge	Value as percentage of range	Min/max bounds, thresholds
Table	Multi-column data, last-value per series	Field overrides, column filtering
Heatmap	Latency distribution over time (histogram buckets)	Color scale, bucket boundaries
Bar chart	Comparison across categories	Grouping, legend display mode
Logs	Log lines from Loki/Elasticsearch	Label filters, log level colours
State timeline	Status/state changes over time	Thresholds, value mapping

# PromQL for common panels:

# Time series — request rate:
rate(http_requests_total{job="myapp"}[5m])

# Stat — current error rate:
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) * 100

# Heatmap — latency distribution (requires histogram metric):
sum(rate(http_request_duration_seconds_bucket{job="myapp"}[$__rate_interval])) by (le)
# Set "Format as" = Heatmap in query options

# Table — per-pod memory:
sum by (pod) (container_memory_working_set_bytes{namespace="production"})
# Set "Instant" query, Sort by value descending

2. Variables — Making Dashboards Dynamic

Template variables, chained selects, and repeat rows

# Dashboard Settings → Variables → Add variable

# Query variable (populate from metric labels):
Name: namespace
Type: Query
Query: label_values(kube_pod_info, namespace)
# Renders as dropdown: all namespace values from Prometheus

# Chained variables (namespace drives pod list):
Name: pod
Query: label_values(kube_pod_info{namespace="$namespace"}, pod)
# Refresh: On time range change / On dashboard load

# Use in queries:
rate(http_requests_total{namespace="$namespace", pod="$pod"}[5m])

# Multi-value variable (select multiple):
# Enable "Multi-value" + "Include All option"
# $namespace becomes regex: production|staging
# Use in queries with =~: namespace=~"$namespace"

# Interval variable (for dynamic rate windows):
Name: interval
Type: Interval
Values: 30s,1m,5m,15m,1h
# Use: rate(metric[$interval])
# Or use $__rate_interval — auto-adjusts to scrape interval × 4 (recommended)

# Repeat rows/panels:
# Panel → Repeat options → Repeat by variable: namespace
# Creates one panel per selected namespace value automatically

3. Alerting

Grafana unified alerting — rules, contact points, notification policies

# Grafana 9+ uses "Unified Alerting" (replaces legacy per-panel alerts)
# Alerting → Alert rules → Create alert rule

# Alert rule anatomy:
# 1. Query: PromQL/LogQL that returns a value
# 2. Condition: WHEN [query] IS ABOVE threshold FOR duration
# 3. Labels: environment=production, team=platform
# 4. Annotations: summary="{{ $labels.pod }} is down"

# Example rule (pod restart rate alert):
Query A: increase(kube_pod_container_status_restarts_total{namespace="production"}[15m])
Condition: WHEN A IS ABOVE 3 FOR 0m
Labels: severity=warning, team=platform
Summary: Pod {{ $labels.pod }} restarted {{ $value }} times in 15m

# Contact points (where alerts go):
# Alerting → Contact points → Add
# Supports: Slack, PagerDuty, OpsGenie, Teams, email, webhook

# Slack contact point:
URL: https://hooks.slack.com/services/xxx
Title: [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
Message: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}

# Notification policies (route labels to contact points):
# Default: all alerts → default contact point
# Override: severity=critical → PagerDuty, severity=warning → Slack

# Silence alerts during maintenance:
# Alerting → Silences → Add silence
# Label matchers: environment=production, scheduled_maintenance=true
# Start/end time

Grafana alerting evaluates every minute by default — don't set thresholds shorter than your scrape interval. If Prometheus scrapes every 15s, a 10s duration alert will always fire on the first evaluation.

4. Provisioning — Dashboards as Code

File-based provisioning for GitOps dashboard management

# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy
    isDefault: true
    jsonData:
      timeInterval: "15s"          # matches prometheus scrape_interval
      incrementalQuerying: true     # performance: only query new time range
      queryTimeout: "60s"

  - name: Loki
    type: loki
    url: http://loki:3100
    jsonData:
      derivedFields:
        - name: TraceID
          matcherRegex: "trace_id=(\w+)"
          url: http://tempo:3200/trace/$${__value.raw}  # click trace_id to open in Tempo

# /etc/grafana/provisioning/dashboards/dashboards.yaml
apiVersion: 1
providers:
  - name: default
    folder: Kubernetes
    type: file
    options:
      path: /var/lib/grafana/dashboards
      foldersFromFilesStructure: true   # subdir = folder name in Grafana

# Place dashboard JSON files in /var/lib/grafana/dashboards/
# Export from Grafana: Dashboard → Share → Export → JSON
# Import community dashboards: grafana.com/grafana/dashboards
# Popular IDs: 6417 (K8s cluster), 1860 (node exporter), 13659 (Loki logs)

5. Useful PromQL Patterns for Grafana

K8s, latency percentiles, error rates — copy-ready queries

# CPU throttling ratio (critical K8s metric):
sum by (pod) (
  rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace"}[$__rate_interval])
) /
sum by (pod) (
  rate(container_cpu_cfs_periods_total{namespace="$namespace"}[$__rate_interval])
) * 100
# > 25% = CPU limit too low, causing application slowdowns

# Latency p50/p95/p99 (requires histogram metric):
histogram_quantile(0.99, sum by (le) (
  rate(http_request_duration_seconds_bucket{job="$app"}[$__rate_interval])
))

# Memory pressure (available vs used):
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /
node_memory_MemTotal_bytes * 100

# PVC nearly full (alert threshold: > 80%):
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) * 100 > 80

# HPA saturation (replicas at max):
kube_horizontalpodautoscaler_status_current_replicas ==
kube_horizontalpodautoscaler_spec_max_replicas

# Absent metric alert (service stopped sending metrics):
absent(up{job="myapp"} == 1)

6. Grafana CLI & Docker Setup

grafana-cli, Docker Compose config, admin password reset

# Docker Compose:
services:
  grafana:
    image: grafana/grafana-oss:latest
    ports: ["3000:3000"]
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_ADMIN_PASSWORD}"
      GF_INSTALL_PLUGINS: "grafana-piechart-panel,grafana-worldmap-panel"
      GF_FEATURE_TOGGLES_ENABLE: "publicDashboards"
      GF_AUTH_ANONYMOUS_ENABLED: "false"
      GF_SMTP_ENABLED: "true"
      GF_SMTP_HOST: "smtp.gmail.com:587"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./provisioning:/etc/grafana/provisioning:ro

# grafana-cli commands:
grafana-cli plugins install grafana-piechart-panel   # install plugin
grafana-cli plugins list-remote                       # available plugins
grafana-cli plugins update-all                        # update all plugins
grafana-cli admin reset-admin-password newpassword    # reset admin (on server)

# Useful API endpoints:
GET  /api/health                                      # readiness check
GET  /api/dashboards/home                             # home dashboard
POST /api/dashboards/db   (body: dashboard JSON)      # create/update dashboard
GET  /api/datasources                                 # list data sources
GET  /api/folders                                     # list folders
POST /api/annotations                                 # create annotation (mark deployments)

Track Grafana, Prometheus, and monitoring tool releases. ReleaseRun monitors Kubernetes, Docker, Python, Go, and 13+ technologies.

Related: Prometheus & PromQL Reference | OpenTelemetry Reference | Kubernetes Reference

🔍 Free tool: HTTP Security Headers Analyzer — check that your Grafana instance returns the right HTTP security headers — HSTS, CSP, X-Frame-Options.