Database Performance Tuning Checklist

A recurring DBA workflow to baseline a production database, find the worst offenders, and apply targeted index, query, and configuration fixes without destabilizing the workload. Run quarterly or after a major schema or traffic change.

6 sections 23 steps Collects data
1

Baseline and Workload Profiling

  1. Capture current performance baseline
    • Pull at least 24 hours of metrics from your monitoring tool (Datadog DBM, SolarWinds DPA, pganalyze, Percona PMM, or Query Store on SQL Server). Record CPU, buffer cache hit ratio, page life expectancy, wait stats, and average query latency at p50/p95/p99 — these are what you compare against after the tuning pass.

    Collects list Collects file Collects number
  2. Identify the top 20 expensive queries
    • Use pg_stat_statements (Postgres), Query Store (SQL Server), or Performance Schema (MySQL) to rank queries by total time, not just per-execution time — a 5ms query running 50,000 times an hour matters more than a 30s nightly report.

  3. Review wait events and bottleneck class
    • Classify the dominant wait type — CPU, IO (PAGEIOLATCH, IO:DataFileRead), lock (LCK_M_X), or memory pressure. The bottleneck class determines whether tuning effort goes to indexes, query rewrites, or hardware.

    Collects list
2

Index Optimization

  1. Analyze execution plans for missing indexes
    • EXPLAIN ANALYZE (Postgres / MySQL) or actual execution plan (SQL Server) on the top offenders from baseline. Watch for sequential scans on large tables, key lookups, and high estimated-vs-actual row count gaps that signal stale statistics rather than a missing index.

  2. Review unused and duplicate indexes
    • Check pg_stat_user_indexes (idx_scan = 0) or sys.dm_db_index_usage_stats. Every unused index costs write throughput and storage. Duplicate-leading-column indexes are also a common find — consolidate before adding new ones.

  3. Evaluate composite and covering indexes
    • For multi-column WHERE / ORDER BY patterns, evaluate composite indexes with the most-selective leading column. A covering index (INCLUDE columns in SQL Server, or index-only scan in Postgres) eliminates lookup cost on hot read paths.

  4. Rebuild or reorganize fragmented indexes
    • Reorganize at 5-30% fragmentation; rebuild above 30%. Use ONLINE = ON (SQL Server Enterprise) or pg_repack (Postgres) to avoid table-level locks. Always run during the change window.

3

Query Performance

  1. Rewrite the worst-offender queries
    • Replace SELECT * with explicit column lists, eliminate correlated subqueries that can be JOINs, push predicates into the most-selective branch, and avoid functions on indexed columns (WHERE UPPER(email) = ... defeats the index).

  2. Validate query parameterization
    • Ad-hoc literal queries blow up the plan cache and prevent reuse. Confirm the application is using prepared statements or parameterized queries; check sys.dm_exec_cached_plans (SQL Server) or pg_stat_statements normalization for ad-hoc patterns.

  3. Test rewrites against production-like data
    • A query that runs 10x faster on a 1-million-row dev dataset can run 5x slower on the 800-million-row production table. Run rewrites against a recent restore or read replica before scheduling the change.

    Collects list
  4. Configure caching at the appropriate tier
    • Decide whether the win belongs in Redis / Memcached at the app tier, in materialized views, or in the database buffer pool. Hot read-mostly lookups belong in Redis with a defined TTL and invalidation path.

4

Resource and Configuration Tuning

  1. Tune memory and buffer pool settings
    • Set shared_buffers (Postgres, ~25% of RAM), innodb_buffer_pool_size (MySQL, ~70-80%), or max server memory (SQL Server, leaving 4-8 GB for OS). Validate cache hit ratio improves before declaring done.

  2. Refresh statistics and optimizer settings
    • Run ANALYZE / UPDATE STATISTICS WITH FULLSCAN on tables that have churned heavily. Stale stats are the most common cause of sudden plan regressions and a faster fix than any index change.

  3. Configure connection pooling
    • Right-size PgBouncer, ProxySQL, or HikariCP. A connection storm — 2,000 app connections hitting Postgres directly — is the silent killer behind many CPU-bound databases. Pool size should be near (cores × 2) + spindles, not the application's whim.

  4. Evaluate partitioning for the largest tables
    • For tables exceeding 100M rows or with clear time-based access patterns, evaluate range partitioning. Partition pruning turns scans of a year of data into scans of a month — but only if every hot query's predicate aligns with the partition key.

    Collects list
  5. Set alerts on key performance counters
    • Add or refresh thresholds on buffer cache hit ratio, deadlocks/sec, long-running queries (>30s), replication lag, and tempdb / WAL growth. Route to PagerDuty or Opsgenie with severity tied to whether it's degrading users.

5

Change Rollout and Validation

  1. Submit change request to CAB
    • Document the proposed changes, blast radius, rollback plan, and expected gain. Schedule against the maintenance window; index rebuilds and partitioning are not standard changes and require explicit CAB approval.

  2. Apply changes during the maintenance window
    • Apply changes in the approved order with checkpoints between each. Stop and roll back if any post-change probe regresses against baseline by more than the agreed threshold (typically 10%).

  3. Compare post-change metrics against baseline
    • Pull the same 24-hour window of metrics captured in baseline. Confirm p95 latency, top-query total time, and wait-stat distribution all moved the right direction. Regressions go on a follow-up ticket; do not silently accept them.

    Collects list Collects number Collects signature
  4. Roll back the change set
    • Execute the documented rollback: drop new indexes, restore prior config values, revert query changes via deploy. File an incident review note describing what regressed and why the pre-change validation missed it.

6

Backup and Recovery Verification

  1. Confirm backups still meet RPO after schema changes
    • Index rebuilds and partitioning can grow the WAL / transaction log dramatically. Verify backup windows still complete on time and that log shipping or replication has caught up before declaring the tuning pass complete.

  2. Run a restore drill against a non-prod target
    • Restore the most recent backup to an isolated environment and validate row counts on the changed tables. A green backup job is not a working backup — only a successful restore proves it.

    Collects list
  3. Update runbook and tuning log
    • Record what was changed, why, the measured before/after, and any follow-up items in the team's runbook (Hudu, IT Glue, Confluence). The next DBA on this database should be able to read the trail without asking you.

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.


Sections 6
Steps 23
Category Systems Administration
Price Free to start
Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Run Database Performance Tuning Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.