Prometheus & PromQL Reference

PromQL queries you'll actually need: metric types, label selectors, rate vs irate, histogram_quantile, recording rules, alerting rules, and the queries that diagnose real production problems.

Metric types — Counter, Gauge, Histogram, Summary

Type	Can go down?	Use for	Query with
Counter	No (resets on restart)	Requests, errors, bytes sent	`rate()` or `increase()`
Gauge	Yes	CPU%, memory, queue depth, temperature	Direct value or `delta()`
Histogram	No (buckets are counters)	Latency, request size	`histogram_quantile()`
Summary	No (quantiles are gauges)	Pre-calculated percentiles	Direct `_quantile` labels

# Common metric naming conventions
# Counters end in _total:        http_requests_total
# Gauges describe current state: node_memory_MemAvailable_bytes
# Histograms have _bucket/_sum/_count: http_request_duration_seconds_{bucket,sum,count}
# Summaries have quantile label: go_gc_duration_seconds{quantile="0.99"}

# NEVER use rate() on a Gauge. NEVER use direct value on a Counter without rate().
# Counter: always use rate() to get "per second" change
# Gauge: use the raw value (or avg_over_time/delta for trends)

# Histograms you'll encounter
# http_request_duration_seconds  — HTTP latency (most exporters)
# grpc_server_handling_seconds   — gRPC latency
# http_request_size_bytes        — request body sizes
# http_response_size_bytes       — response body sizes
# bucket example:
# http_request_duration_seconds_bucket{le="0.1"}  → requests that took <= 100ms
# http_request_duration_seconds_bucket{le="+Inf"} → all requests (= _count)

Label selectors — filtering metrics

# Label matcher operators
= exact match        {job="prometheus"}
!= not equal         {status!="200"}
=~ regex match       {status=~"5.."}     # 5xx errors
!~ regex not match   {method!~"GET|HEAD"}

# Common patterns
# All 5xx errors
http_requests_total{status=~"5.."}

# Exclude health check endpoint
http_requests_total{path!="/health", path!="/ready"}

# Multiple label filters (AND)
http_requests_total{job="api", namespace="production", status=~"4.."}

# Regex: starts with
rate(http_requests_total{path=~"/api/.*"}[5m])

# Empty label (= missing label)
up{instance=""}   # not useful but valid

# PromQL label manipulation
# label_replace — add/change a label on query results
label_replace(
  up,
  "short_instance",            # new label name
  "$1",                        # new label value (regex group)
  "instance",                  # source label
  "(.+):\\d+"                  # regex on source (strip port)
)

# without() — remove labels from aggregation
# by() — keep only specified labels in aggregation
sum(http_requests_total) by (status, method)   # group by status + method
sum(http_requests_total) without (instance)     # aggregate away instance label

rate, irate, increase — working with counters

# rate() — per-second rate, smoothed over the range window
# Use for dashboards, alerting, slow-moving counters
rate(http_requests_total[5m])           # req/s averaged over 5 min
rate(http_requests_total[5m]) * 60      # req/minute

# irate() — instantaneous rate using last 2 samples
# Use for detecting short spikes; more noisy than rate()
irate(http_requests_total[5m])          # req/s at this instant

# increase() — total increase over the range window
# Equivalent to rate() * range_in_seconds
increase(http_requests_total[1h])       # total requests in last hour

# Range selection rules of thumb:
# [5m]   — good for dashboards, catches recent changes
# [1m]   — very responsive but noisy, needs frequent scraping
# [15m]  — better for slow-moving metrics
# [1h]   — for trends and reporting

# Counter resets: rate() and increase() handle restarts automatically
# They detect when a counter drops to 0 and adjust the calculation

# Error rate
rate(http_requests_total{status=~"5.."}[5m])

# Error ratio (errors / total)
rate(http_requests_total{status=~"5.."}[5m])
  /
rate(http_requests_total[5m])

# Request rate by status
sum(rate(http_requests_total[5m])) by (status)

# Top 10 endpoints by request rate
topk(10, sum(rate(http_requests_total[5m])) by (path))

irate() is more reactive but also more unstable. A single slow scrape can make irate spike. Prefer rate() for alerting — it's more reliable.

histogram_quantile — latency percentiles

# histogram_quantile(φ, rate(metric_bucket[5m]))
# φ is 0..1 (0.5 = p50, 0.9 = p90, 0.99 = p99, 0.999 = p99.9)

# P99 latency (single service)
histogram_quantile(0.99,
  rate(http_request_duration_seconds_bucket[5m])
)

# P99 latency by service (keep job label for grouping)
histogram_quantile(0.99,
  sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job)
)

# P99 latency for specific path
histogram_quantile(0.99,
  rate(http_request_duration_seconds_bucket{path="/api/v1/users"}[5m])
)

# Multiple percentiles — union them for Grafana panels
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

# Average latency (not a percentile, but useful baseline)
rate(http_request_duration_seconds_sum[5m])
  /
rate(http_request_duration_seconds_count[5m])

# Check bucket coverage — if your p99 returns +Inf, your highest bucket is too low
# Look at your actual request duration distribution:
http_request_duration_seconds_bucket{le="+Inf"} > 0  # total requests
http_request_duration_seconds_bucket{le="1"}          # requests <= 1s

# Good bucket boundaries for web services (seconds):
# 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10

Aggregation operators

# Aggregation operators: sum, min, max, avg, count, stddev, topk, bottomk, quantile

# sum — add all values (most common)
sum(http_requests_total)                          # grand total
sum(http_requests_total) by (status)              # per status code
sum(http_requests_total) without (instance)       # aggregate instances

# avg — average across series
avg(node_cpu_seconds_total{mode="idle"}) by (node)

# max/min — highest/lowest
max(container_memory_usage_bytes) by (pod)

# count — number of time series
count(up)                    # how many jobs Prometheus is scraping
count(up == 0)               # how many are down

# topk/bottomk — top/bottom N series
topk(5, sum(rate(http_requests_total[5m])) by (path))   # top 5 paths by traffic
bottomk(3, avg(node_cpu_seconds_total{mode="idle"}) by (node))  # least-idle nodes

# quantile — Nth percentile of values across series (not histogram)
quantile(0.95, rate(http_requests_total[5m]))    # p95 across all series

# Aggregation over time (not across series)
avg_over_time(container_cpu_usage_seconds_total[1h])   # avg value over 1h
max_over_time(container_memory_usage_bytes[1d])         # peak memory in 1 day
min_over_time(up[5m])                                  # any downtime in 5 min?

# absent() — alert when metric disappears (job stopped exporting)
absent(up{job="api"})              # true if no up{job="api"} exists

Alerting rules — practical SLO-based alerts

# prometheus.yml — alerting rule file reference
groups:
  - name: api.rules
    interval: 30s    # evaluation interval (defaults to global)
    rules:

      # Alert: high error rate
      - alert: HighErrorRate
        expr: |
          (
            rate(http_requests_total{status=~"5.."}[5m])
            /
            rate(http_requests_total[5m])
          ) > 0.05
        for: 5m           # must be true for 5 min before firing
        labels:
          severity: warning
        annotations:
          summary: "High error rate on {{ $labels.job }}"
          description: "Error rate is {{ $value | humanizePercentage }} over 5m"

      # Alert: high latency P99
      - alert: HighLatencyP99
        expr: |
          histogram_quantile(0.99,
            sum(rate(http_request_duration_seconds_bucket[5m])) by (le, job)
          ) > 1.0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency > 1s on {{ $labels.job }}"
          description: "P99 latency: {{ $value | humanizeDuration }}"

      # Alert: pod down
      - alert: PodDown
        expr: up{job!="pushgateway"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Prometheus target down: {{ $labels.instance }}"

      # Alert: high memory usage
      - alert: HighMemoryUsage
        expr: |
          container_memory_working_set_bytes{container!=""}
            /
          container_spec_memory_limit_bytes{container!=""}
          > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.container }} memory > 85%"

# Recording rules — pre-compute expensive queries
groups:
  - name: recording.rules
    rules:
      - record: job:http_requests:rate5m
        expr: sum(rate(http_requests_total[5m])) by (job)

      - record: job:http_errors:rate5m
        expr: sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)

      - record: job:http_error_rate:rate5m
        expr: job:http_errors:rate5m / job:http_requests:rate5m

Essential Kubernetes monitoring queries

# CPU throttling (containers hitting CPU limits)
sum(rate(container_cpu_throttled_seconds_total[5m])) by (pod, container)
  /
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod, container)

# Memory pressure (approaching limit)
container_memory_working_set_bytes{container!=""}
  /
container_spec_memory_limit_bytes{container!=""} * 100

# Pod restarts in last hour
increase(kube_pod_container_status_restarts_total[1h]) > 0

# Node disk usage
(node_filesystem_size_bytes - node_filesystem_avail_bytes)
  / node_filesystem_size_bytes * 100

# Pending pods (scheduling problems)
kube_pod_status_phase{phase="Pending"} > 0

# Failed jobs
kube_job_status_failed > 0

# PVC nearly full
(kubelet_volume_stats_capacity_bytes - kubelet_volume_stats_available_bytes)
  / kubelet_volume_stats_capacity_bytes * 100 > 80

# Network I/O by pod
rate(container_network_receive_bytes_total[5m])
rate(container_network_transmit_bytes_total[5m])

# HPA current vs desired replicas
kube_horizontalpodautoscaler_status_current_replicas
  != kube_horizontalpodautoscaler_spec_max_replicas

# Pods not ready
kube_pod_status_ready{condition="false"} == 1

The easiest way to explore and build these queries is with our free PromQL Builder tool — metric picker, label filters, aggregations, and 8 query templates.

🔍 Free tool: PyPI Package Health Checker — check prometheus-client and other Python monitoring packages for known CVEs and active maintenance.