PostgreSQL Advanced SQL Reference: Window Functions, CTEs, JSONB, EXPLAIN & Indexes

PostgreSQL advanced SQL — window functions, CTEs, JSONB, EXPLAIN ANALYZE, and index strategies. These features separate basic SQL users from engineers who can actually optimize slow queries and design schemas that scale.

1. Window Functions

ROW_NUMBER, RANK, LAG, LEAD, running totals, and moving averages

-- Window functions run OVER a partition without collapsing rows (unlike GROUP BY)

-- ROW_NUMBER, RANK, DENSE_RANK:
SELECT
  employee_id, department, salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
  RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank,     -- gaps on ties
  DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank  -- no gaps
FROM employees;

-- Get top-N per group (common pattern):
SELECT * FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
  FROM employees
) t WHERE rn <= 3;                          -- top 3 earners per department

-- Running total (cumulative SUM):
SELECT
  order_date,
  amount,
  SUM(amount) OVER (ORDER BY order_date) AS running_total,
  SUM(amount) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7day
FROM orders;

-- LAG / LEAD (access previous/next row):
SELECT
  order_date,
  amount,
  LAG(amount, 1) OVER (ORDER BY order_date) AS prev_day_amount,
  amount - LAG(amount, 1) OVER (ORDER BY order_date) AS day_over_day_change,
  LEAD(amount, 1) OVER (ORDER BY order_date) AS next_day_amount
FROM daily_sales;

-- Percent of total:
SELECT
  department,
  salary,
  ROUND(salary / SUM(salary) OVER (PARTITION BY department) * 100, 2) AS pct_of_dept_total
FROM employees;

2. CTEs (Common Table Expressions)

WITH clauses, recursive CTEs, and query organization

-- Basic CTE (named subquery — cleaner than nested subqueries):
WITH monthly_revenue AS (
  SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS revenue
  FROM orders
  WHERE status = 'completed'
  GROUP BY 1
),
ranked_months AS (
  SELECT *,
    RANK() OVER (ORDER BY revenue DESC) AS rank
  FROM monthly_revenue
)
SELECT * FROM ranked_months WHERE rank <= 5;    -- top 5 revenue months

-- Multiple CTEs:
WITH
active_users AS (SELECT id, name FROM users WHERE active = true),
recent_orders AS (
  SELECT user_id, COUNT(*) AS order_count
  FROM orders
  WHERE created_at > NOW() - INTERVAL '30 days'
  GROUP BY user_id
)
SELECT u.name, COALESCE(o.order_count, 0) AS orders_last_30d
FROM active_users u
LEFT JOIN recent_orders o ON u.id = o.user_id;

-- Recursive CTE (tree traversal — org chart, file paths, etc.):
WITH RECURSIVE org_chart AS (
  -- Base case: CEO (no manager):
  SELECT id, name, manager_id, 0 AS level, name::text AS path
  FROM employees WHERE manager_id IS NULL

  UNION ALL

  -- Recursive case: add direct reports:
  SELECT e.id, e.name, e.manager_id, oc.level + 1,
         oc.path || ' > ' || e.name
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, path;
-- CYCLE detection (PostgreSQL 14+):
-- CYCLE id SET is_cycle USING path

-- MATERIALIZED hint (force CTE to execute once, not inline):
WITH stats AS MATERIALIZED (    -- prevents planner from inlining and re-running
  SELECT AVG(salary) AS avg_sal FROM employees
)
SELECT * FROM employees e, stats WHERE e.salary > stats.avg_sal * 1.5;

3. JSONB — Store and Query JSON

JSONB operators, GIN indexes, and query patterns

-- JSONB vs JSON: JSONB is binary-stored, indexed, faster for queries.
-- Always use JSONB unless you need to preserve key order or exact whitespace.

-- Create table with JSONB column:
CREATE TABLE events (
  id BIGSERIAL PRIMARY KEY,
  event_type VARCHAR(100),
  payload JSONB NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Insert JSONB:
INSERT INTO events (event_type, payload)
VALUES ('user.signup', '{"user_id": 123, "email": "alice@example.com", "tags": ["beta", "mobile"]}');

-- Query JSONB:
SELECT payload->>'email' FROM events;               -- text (->>'key')
SELECT payload->'user_id' FROM events;              -- JSONB (->'key')
SELECT payload->>'tags'->0 FROM events;             -- first array element
SELECT payload#>>'{address,city}' FROM events;      -- nested path

-- Filter on JSONB value:
SELECT * FROM events WHERE payload->>'event_type' = 'signup';
SELECT * FROM events WHERE payload->'user_id' = '123'::jsonb;
SELECT * FROM events WHERE payload @> '{"tags": ["beta"]}';  -- contains
SELECT * FROM events WHERE payload ? 'email';                 -- key exists
SELECT * FROM events WHERE payload ?| ARRAY['email','phone'];  -- any key exists

-- GIN index (required for @> and ? operators to be fast):
CREATE INDEX events_payload_gin ON events USING GIN (payload);
-- Or partial GIN for specific paths (smaller index):
CREATE INDEX events_user_id ON events ((payload->'user_id'));  -- expression index

-- Update JSONB:
UPDATE events SET payload = payload || '{"verified": true}'    -- merge/update keys
WHERE payload->>'email' = 'alice@example.com';
UPDATE events SET payload = payload - 'sensitive_key';         -- remove key

4. EXPLAIN ANALYZE — Reading Query Plans

Interpret execution plans, find slow operations, and fix them

-- Always use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT):
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;

-- Key things to look for in the plan:

-- 1. Seq Scan on large tables = missing index
-- Seq Scan on users (cost=0.00..1234.56 rows=50000 ...)
--   → Add index: CREATE INDEX idx_users_created_at ON users (created_at);

-- 2. actual rows vs estimated rows — large discrepancy = stale stats
-- (actual rows=50000 rows=100) → ANALYZE users;

-- 3. Nested Loop on large tables = consider Hash Join
-- Nested Loop (cost=0.00..50000.00 ...) → check join condition index

-- 4. Sort + Limit = potential window function or index optimization

-- 5. Buffers: hits vs reads
-- Buffers: shared hit=100 read=5000
--   hit = from cache (fast); read = from disk (slow)
--   → Add pg_prewarm or increase shared_buffers

-- Use pgMustard or explain.dalibo.com for visual plan analysis

-- Auto-explain slow queries (add to postgresql.conf):
-- shared_preload_libraries = 'auto_explain'
-- auto_explain.log_min_duration = '100ms'    -- log plans for queries > 100ms
-- auto_explain.log_analyze = on

5. Index Strategies

B-tree, GIN, partial, expression, and covering indexes

-- B-tree (default — equality + range queries):
CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_created_at ON orders (created_at DESC);

-- Composite index (order matters — put equality columns first):
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- Covers: WHERE user_id = ? AND status = ?
-- Also covers: WHERE user_id = ?  (leftmost prefix)
-- Doesn't use for: WHERE status = ?  (skipped first column)

-- Partial index (index only relevant subset — smaller + faster):
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';         -- only index pending orders
-- Covers: WHERE status = 'pending' AND created_at > '2025-01-01'
-- But NOT: WHERE status = 'completed' (won't use this index)

-- Expression index (on computed value):
CREATE INDEX idx_users_lower_email ON users (LOWER(email));
-- Covers: WHERE LOWER(email) = LOWER('Alice@Example.com')

-- Covering index (INCLUDE — avoid heap fetch for queries reading only these columns):
CREATE INDEX idx_orders_covering ON orders (user_id, status)
INCLUDE (amount, created_at);
-- SELECT amount, created_at FROM orders WHERE user_id = ? AND status = ?
-- → index-only scan (no heap access needed)

-- CONCURRENTLY (build index without locking writes):
CREATE INDEX CONCURRENTLY idx_orders_new ON orders (created_at);

-- Check index usage:
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0;   -- indexes never used — candidates for removal

Track PostgreSQL and database tool releases.
ReleaseRun monitors PostgreSQL, MySQL, MongoDB, and 13+ technologies.

🔍 Free tool: PyPI Package Health Checker — check psycopg2, asyncpg, and SQLAlchemy for known CVEs and active maintenance before upgrading your Postgres stack.

Founded

2023 in London, UK

Contact

hello@releaserun.com