Grafana Reference: Dashboards, Variables, Alerting, PromQL Patterns & Provisioning
Grafana is the standard for metrics visualisation. This covers panel types, variables, alerting, provisioning, and the patterns that make dashboards actually useful.
1. Panel Types & Queries
Time series, stat, table, heatmap — when to use each
| Panel Type | Use for | Key config |
|---|---|---|
| Time series | Metrics over time (CPU, req/s, latency) | Fill opacity, gradient mode, stacking |
| Stat | Single current value (uptime, error count) | Thresholds (green/yellow/red), sparkline |
| Gauge | Value as percentage of range | Min/max bounds, thresholds |
| Table | Multi-column data, last-value per series | Field overrides, column filtering |
| Heatmap | Latency distribution over time (histogram buckets) | Color scale, bucket boundaries |
| Bar chart | Comparison across categories | Grouping, legend display mode |
| Logs | Log lines from Loki/Elasticsearch | Label filters, log level colours |
| State timeline | Status/state changes over time | Thresholds, value mapping |
# PromQL for common panels:
# Time series — request rate:
rate(http_requests_total{job="myapp"}[5m])
# Stat — current error rate:
sum(rate(http_requests_total{status=~"5.."}[5m])) /
sum(rate(http_requests_total[5m])) * 100
# Heatmap — latency distribution (requires histogram metric):
sum(rate(http_request_duration_seconds_bucket{job="myapp"}[$__rate_interval])) by (le)
# Set "Format as" = Heatmap in query options
# Table — per-pod memory:
sum by (pod) (container_memory_working_set_bytes{namespace="production"})
# Set "Instant" query, Sort by value descending
2. Variables — Making Dashboards Dynamic
Template variables, chained selects, and repeat rows
# Dashboard Settings → Variables → Add variable
# Query variable (populate from metric labels):
Name: namespace
Type: Query
Query: label_values(kube_pod_info, namespace)
# Renders as dropdown: all namespace values from Prometheus
# Chained variables (namespace drives pod list):
Name: pod
Query: label_values(kube_pod_info{namespace="$namespace"}, pod)
# Refresh: On time range change / On dashboard load
# Use in queries:
rate(http_requests_total{namespace="$namespace", pod="$pod"}[5m])
# Multi-value variable (select multiple):
# Enable "Multi-value" + "Include All option"
# $namespace becomes regex: production|staging
# Use in queries with =~: namespace=~"$namespace"
# Interval variable (for dynamic rate windows):
Name: interval
Type: Interval
Values: 30s,1m,5m,15m,1h
# Use: rate(metric[$interval])
# Or use $__rate_interval — auto-adjusts to scrape interval × 4 (recommended)
# Repeat rows/panels:
# Panel → Repeat options → Repeat by variable: namespace
# Creates one panel per selected namespace value automatically
3. Alerting
Grafana unified alerting — rules, contact points, notification policies
# Grafana 9+ uses "Unified Alerting" (replaces legacy per-panel alerts)
# Alerting → Alert rules → Create alert rule
# Alert rule anatomy:
# 1. Query: PromQL/LogQL that returns a value
# 2. Condition: WHEN [query] IS ABOVE threshold FOR duration
# 3. Labels: environment=production, team=platform
# 4. Annotations: summary="{{ $labels.pod }} is down"
# Example rule (pod restart rate alert):
Query A: increase(kube_pod_container_status_restarts_total{namespace="production"}[15m])
Condition: WHEN A IS ABOVE 3 FOR 0m
Labels: severity=warning, team=platform
Summary: Pod {{ $labels.pod }} restarted {{ $value }} times in 15m
# Contact points (where alerts go):
# Alerting → Contact points → Add
# Supports: Slack, PagerDuty, OpsGenie, Teams, email, webhook
# Slack contact point:
URL: https://hooks.slack.com/services/xxx
Title: [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
Message: {{ range .Alerts }}{{ .Annotations.summary }}{{ end }}
# Notification policies (route labels to contact points):
# Default: all alerts → default contact point
# Override: severity=critical → PagerDuty, severity=warning → Slack
# Silence alerts during maintenance:
# Alerting → Silences → Add silence
# Label matchers: environment=production, scheduled_maintenance=true
# Start/end time
4. Provisioning — Dashboards as Code
File-based provisioning for GitOps dashboard management
# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus:9090
access: proxy
isDefault: true
jsonData:
timeInterval: "15s" # matches prometheus scrape_interval
incrementalQuerying: true # performance: only query new time range
queryTimeout: "60s"
- name: Loki
type: loki
url: http://loki:3100
jsonData:
derivedFields:
- name: TraceID
matcherRegex: "trace_id=(\w+)"
url: http://tempo:3200/trace/$${__value.raw} # click trace_id to open in Tempo
# /etc/grafana/provisioning/dashboards/dashboards.yaml
apiVersion: 1
providers:
- name: default
folder: Kubernetes
type: file
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: true # subdir = folder name in Grafana
# Place dashboard JSON files in /var/lib/grafana/dashboards/
# Export from Grafana: Dashboard → Share → Export → JSON
# Import community dashboards: grafana.com/grafana/dashboards
# Popular IDs: 6417 (K8s cluster), 1860 (node exporter), 13659 (Loki logs)
5. Useful PromQL Patterns for Grafana
K8s, latency percentiles, error rates — copy-ready queries
# CPU throttling ratio (critical K8s metric):
sum by (pod) (
rate(container_cpu_cfs_throttled_seconds_total{namespace="$namespace"}[$__rate_interval])
) /
sum by (pod) (
rate(container_cpu_cfs_periods_total{namespace="$namespace"}[$__rate_interval])
) * 100
# > 25% = CPU limit too low, causing application slowdowns
# Latency p50/p95/p99 (requires histogram metric):
histogram_quantile(0.99, sum by (le) (
rate(http_request_duration_seconds_bucket{job="$app"}[$__rate_interval])
))
# Memory pressure (available vs used):
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) /
node_memory_MemTotal_bytes * 100
# PVC nearly full (alert threshold: > 80%):
(kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes) * 100 > 80
# HPA saturation (replicas at max):
kube_horizontalpodautoscaler_status_current_replicas ==
kube_horizontalpodautoscaler_spec_max_replicas
# Absent metric alert (service stopped sending metrics):
absent(up{job="myapp"} == 1)
6. Grafana CLI & Docker Setup
grafana-cli, Docker Compose config, admin password reset
# Docker Compose:
services:
grafana:
image: grafana/grafana-oss:latest
ports: ["3000:3000"]
environment:
GF_SECURITY_ADMIN_PASSWORD: "${GRAFANA_ADMIN_PASSWORD}"
GF_INSTALL_PLUGINS: "grafana-piechart-panel,grafana-worldmap-panel"
GF_FEATURE_TOGGLES_ENABLE: "publicDashboards"
GF_AUTH_ANONYMOUS_ENABLED: "false"
GF_SMTP_ENABLED: "true"
GF_SMTP_HOST: "smtp.gmail.com:587"
volumes:
- grafana_data:/var/lib/grafana
- ./provisioning:/etc/grafana/provisioning:ro
# grafana-cli commands:
grafana-cli plugins install grafana-piechart-panel # install plugin
grafana-cli plugins list-remote # available plugins
grafana-cli plugins update-all # update all plugins
grafana-cli admin reset-admin-password newpassword # reset admin (on server)
# Useful API endpoints:
GET /api/health # readiness check
GET /api/dashboards/home # home dashboard
POST /api/dashboards/db (body: dashboard JSON) # create/update dashboard
GET /api/datasources # list data sources
GET /api/folders # list folders
POST /api/annotations # create annotation (mark deployments)
Track Grafana, Prometheus, and monitoring tool releases.
ReleaseRun monitors Kubernetes, Docker, Python, Go, and 13+ technologies.
Related: Prometheus & PromQL Reference | OpenTelemetry Reference | Kubernetes Reference
🔍 Free tool: HTTP Security Headers Analyzer — check that your Grafana instance returns the right HTTP security headers — HSTS, CSP, X-Frame-Options.
Founded
2023 in London, UK
Contact
hello@releaserun.com