Prometheus & PromQL Reference
Prometheus & PromQL Reference
PromQL queries you’ll actually need: metric types, label selectors, rate vs irate, histogram_quantile, recording rules, alerting rules, and the queries that diagnose real production problems.
Metric types — Counter, Gauge, Histogram, Summary
| Type | Can go down? | Use for | Query with |
|---|---|---|---|
| Counter | No (resets on restart) | Requests, errors, bytes sent | rate() or increase() |
| Gauge | Yes | CPU%, memory, queue depth, temperature | Direct value or delta() |
| Histogram | No (buckets are counters) | Latency, request size | histogram_quantile() |
| Summary | No (quantiles are gauges) | Pre-calculated percentiles | Direct _quantile labels |
# Common metric naming conventions
# Counters end in _total: http_requests_total
# Gauges describe current state: node_memory_MemAvailable_bytes
# Histograms have _bucket/_sum/_count: http_request_duration_seconds_{bucket,sum,count}
# Summaries have quantile label: go_gc_duration_seconds{quantile="0.99"}
# NEVER use rate() on a Gauge. NEVER use direct value on a Counter without rate().
# Counter: always use rate() to get "per second" change
# Gauge: use the raw value (or avg_over_time/delta for trends)
# Histograms you'll encounter
# http_request_duration_seconds — HTTP latency (most exporters)
# grpc_server_handling_seconds — gRPC latency
# http_request_size_bytes — request body sizes
# http_response_size_bytes — response body sizes
# bucket example:
# http_request_duration_seconds_bucket{le="0.1"} → requests that took <= 100ms
# http_request_duration_seconds_bucket{le="+Inf"} → all requests (= _count)
Label selectors — filtering metrics
# Label matcher operators
= exact match {job="prometheus"}
!= not equal {status!="200"}
=~ regex match {status=~"5.."} # 5xx errors
!~ regex not match {method!~"GET|HEAD"}
# Common patterns
# All 5xx errors
http_requests_total{status=~"5.."}
# Exclude health check endpoint
http_requests_total{path!="/health", path!="/ready"}
# Multiple label filters (AND)
http_requests_total{job="api", namespace="production", status=~"4.."}
# Regex: starts with
rate(http_requests_total{path=~"/api/.*"}[5m])
# Empty label (= missing label)
up{instance=""} # not useful but valid
# PromQL label manipulation
# label_replace — add/change a label on query results
label_replace(
up,
"short_instance", # new label name
"$1", # new label value (regex group)
"instance", # source label
"(.+):\\d+" # regex on source (strip port)
)
# without() — remove labels from aggregation
# by() — keep only specified labels in aggregation
sum(http_requests_total) by (status, method) # group by status + method
sum(http_requests_total) without (instance) # aggregate away instance label
rate, irate, increase — working with counters
# rate() — per-second rate, smoothed over the range window
# Use for dashboards, alerting, slow-moving counters
rate(http_requests_total[5m]) # req/s averaged over 5 min
rate(http_requests_total[5m]) * 60 # req/minute
# irate() — instantaneous rate using last 2 samples
# Use for detecting short spikes; more noisy than rate()
irate(http_requests_total[5m]) # req/s at this instant
# increase() — total increase over the range window
# Equivalent to rate() * range_in_seconds
increase(http_requests_total[1h]) # total requests in last hour
# Range selection rules of thumb:
# [5m] — good for dashboards, catches recent changes
# [1m] — very responsive but noisy, needs frequent scraping
# [15m] — better for slow-moving metrics
# [1h] — for trends and reporting
# Counter resets: rate() and increase() handle restarts automatically
# They detect when a counter drops to 0 and adjust the calculation
# Error rate
rate(http_requests_total{status=~"5.."}[5m])
# Error ratio (errors / total)
rate(http_requests_total{status=~"5.."}[5m])
/
rate(http_requests_total[5m])
# Request rate by status
sum(rate(http_requests_total[5m])) by (status)
# Top 10 endpoints by request rate
topk(10, sum(rate(http_requests_total[5m])) by (path))
irate() is more reactive but also more unstable. A single slow scrape can make irate spike. Prefer rate() for alerting — it's more reliable.
histogram_quantile — latency percentiles
# histogram_quantile(φ, rate(metric_bucket[5m]))
# φ is 0..1 (0.5 = p50, 0.9 = p90, 0.99 = p99, 0.999 = p99.9)
# P99 latency (single service)
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket[5m])
)
# P99 latency by service (keep job label for grouping)
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job)
)
# P99 latency for specific path
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket{path="/api/v1/users"}[5m])
)
# Multiple percentiles — union them for Grafana panels
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
# Average latency (not a percentile, but useful baseline)
rate(http_request_duration_seconds_sum[5m])
/
rate(http_request_duration_seconds_count[5m])
# Check bucket coverage — if your p99 returns +Inf, your highest bucket is too low
# Look at your actual request duration distribution:
http_request_duration_seconds_bucket{le="+Inf"} > 0 # total requests
http_request_duration_seconds_bucket{le="1"} # requests <= 1s
# Good bucket boundaries for web services (seconds):
# 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10
Aggregation operators
# Aggregation operators: sum, min, max, avg, count, stddev, topk, bottomk, quantile
# sum — add all values (most common)
sum(http_requests_total) # grand total
sum(http_requests_total) by (status) # per status code
sum(http_requests_total) without (instance) # aggregate instances
# avg — average across series
avg(node_cpu_seconds_total{mode="idle"}) by (node)
# max/min — highest/lowest
max(container_memory_usage_bytes) by (pod)
# count — number of time series
count(up) # how many jobs Prometheus is scraping
count(up == 0) # how many are down
# topk/bottomk — top/bottom N series
topk(5, sum(rate(http_requests_total[5m])) by (path)) # top 5 paths by traffic
bottomk(3, avg(node_cpu_seconds_total{mode="idle"}) by (node)) # least-idle nodes
# quantile — Nth percentile of values across series (not histogram)
quantile(0.95, rate(http_requests_total[5m])) # p95 across all series
# Aggregation over time (not across series)
avg_over_time(container_cpu_usage_seconds_total[1h]) # avg value over 1h
max_over_time(container_memory_usage_bytes[1d]) # peak memory in 1 day
min_over_time(up[5m]) # any downtime in 5 min?
# absent() — alert when metric disappears (job stopped exporting)
absent(up{job="api"}) # true if no up{job="api"} exists
Alerting rules — practical SLO-based alerts
# prometheus.yml — alerting rule file reference
groups:
- name: api.rules
interval: 30s # evaluation interval (defaults to global)
rules:
# Alert: high error rate
- alert: HighErrorRate
expr: |
(
rate(http_requests_total{status=~"5.."}[5m])
/
rate(http_requests_total[5m])
) > 0.05
for: 5m # must be true for 5 min before firing
labels:
severity: warning
annotations:
summary: "High error rate on {{ $labels.job }}"
description: "Error rate is {{ $value | humanizePercentage }} over 5m"
# Alert: high latency P99
- alert: HighLatencyP99
expr: |
histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job)
) > 1.0
for: 10m
labels:
severity: warning
annotations:
summary: "P99 latency > 1s on {{ $labels.job }}"
description: "P99 latency: {{ $value | humanizeDuration }}"
# Alert: pod down
- alert: PodDown
expr: up{job!="pushgateway"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Prometheus target down: {{ $labels.instance }}"
# Alert: high memory usage
- alert: HighMemoryUsage
expr: |
container_memory_working_set_bytes{container!=""}
/
container_spec_memory_limit_bytes{container!=""}
> 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} memory > 85%"
# Recording rules — pre-compute expensive queries
groups:
- name: recording.rules
rules:
- record: job:http_requests:rate5m
expr: sum(rate(http_requests_total[5m])) by (job)
- record: job:http_errors:rate5m
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
- record: job:http_error_rate:rate5m
expr: job:http_errors:rate5m / job:http_requests:rate5m
Essential Kubernetes monitoring queries
# CPU throttling (containers hitting CPU limits)
sum(rate(container_cpu_throttled_seconds_total[5m])) by (pod, container)
/
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod, container)
# Memory pressure (approaching limit)
container_memory_working_set_bytes{container!=""}
/
container_spec_memory_limit_bytes{container!=""} * 100
# Pod restarts in last hour
increase(kube_pod_container_status_restarts_total[1h]) > 0
# Node disk usage
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
/ node_filesystem_size_bytes * 100
# Pending pods (scheduling problems)
kube_pod_status_phase{phase="Pending"} > 0
# Failed jobs
kube_job_status_failed > 0
# PVC nearly full
(kubelet_volume_stats_capacity_bytes - kubelet_volume_stats_available_bytes)
/ kubelet_volume_stats_capacity_bytes * 100 > 80
# Network I/O by pod
rate(container_network_receive_bytes_total[5m])
rate(container_network_transmit_bytes_total[5m])
# HPA current vs desired replicas
kube_horizontalpodautoscaler_status_current_replicas
!= kube_horizontalpodautoscaler_spec_max_replicas
# Pods not ready
kube_pod_status_ready{condition="false"} == 1
The easiest way to explore and build these queries is with our free PromQL Builder tool — metric picker, label filters, aggregations, and 8 query templates.
🔍 Free tool: PyPI Package Health Checker — check prometheus-client and other Python monitoring packages for known CVEs and active maintenance.
Founded
2023 in London, UK
Contact
hello@releaserun.com