Database Performance Tuning Checklist
A recurring DBA workflow to baseline a production database, find the worst offenders, and apply targeted index, query, and configuration fixes without destabilizing the workload. Run quarterly or after a major schema or traffic change.
Baseline and Workload Profiling
-
Capture current performance baseline
Pull at least 24 hours of metrics from your monitoring tool (Datadog DBM, SolarWinds DPA, pganalyze, Percona PMM, or Query Store on SQL Server). Record CPU, buffer cache hit ratio, page life expectancy, wait stats, and average query latency at p50/p95/p99 — these are what you compare against after the tuning pass.
Collects list Collects file Collects number -
Identify the top 20 expensive queries
Use pg_stat_statements (Postgres), Query Store (SQL Server), or Performance Schema (MySQL) to rank queries by total time, not just per-execution time — a 5ms query running 50,000 times an hour matters more than a 30s nightly report.
-
Review wait events and bottleneck class
Classify the dominant wait type — CPU, IO (PAGEIOLATCH, IO:DataFileRead), lock (LCK_M_X), or memory pressure. The bottleneck class determines whether tuning effort goes to indexes, query rewrites, or hardware.
Collects list
Index Optimization
-
Analyze execution plans for missing indexes
EXPLAIN ANALYZE (Postgres / MySQL) or actual execution plan (SQL Server) on the top offenders from baseline. Watch for sequential scans on large tables, key lookups, and high estimated-vs-actual row count gaps that signal stale statistics rather than a missing index.
-
Review unused and duplicate indexes
Check pg_stat_user_indexes (idx_scan = 0) or sys.dm_db_index_usage_stats. Every unused index costs write throughput and storage. Duplicate-leading-column indexes are also a common find — consolidate before adding new ones.
-
Evaluate composite and covering indexes
For multi-column WHERE / ORDER BY patterns, evaluate composite indexes with the most-selective leading column. A covering index (INCLUDE columns in SQL Server, or index-only scan in Postgres) eliminates lookup cost on hot read paths.
-
Rebuild or reorganize fragmented indexes
Reorganize at 5-30% fragmentation; rebuild above 30%. Use ONLINE = ON (SQL Server Enterprise) or pg_repack (Postgres) to avoid table-level locks. Always run during the change window.
Query Performance
-
Rewrite the worst-offender queries
Replace SELECT * with explicit column lists, eliminate correlated subqueries that can be JOINs, push predicates into the most-selective branch, and avoid functions on indexed columns (WHERE UPPER(email) = ... defeats the index).
-
Validate query parameterization
Ad-hoc literal queries blow up the plan cache and prevent reuse. Confirm the application is using prepared statements or parameterized queries; check sys.dm_exec_cached_plans (SQL Server) or pg_stat_statements normalization for ad-hoc patterns.
-
Test rewrites against production-like data
A query that runs 10x faster on a 1-million-row dev dataset can run 5x slower on the 800-million-row production table. Run rewrites against a recent restore or read replica before scheduling the change.
Collects list -
Configure caching at the appropriate tier
Decide whether the win belongs in Redis / Memcached at the app tier, in materialized views, or in the database buffer pool. Hot read-mostly lookups belong in Redis with a defined TTL and invalidation path.
Resource and Configuration Tuning
-
Tune memory and buffer pool settings
Set shared_buffers (Postgres, ~25% of RAM), innodb_buffer_pool_size (MySQL, ~70-80%), or max server memory (SQL Server, leaving 4-8 GB for OS). Validate cache hit ratio improves before declaring done.
-
Refresh statistics and optimizer settings
Run ANALYZE / UPDATE STATISTICS WITH FULLSCAN on tables that have churned heavily. Stale stats are the most common cause of sudden plan regressions and a faster fix than any index change.
-
Configure connection pooling
Right-size PgBouncer, ProxySQL, or HikariCP. A connection storm — 2,000 app connections hitting Postgres directly — is the silent killer behind many CPU-bound databases. Pool size should be near (cores × 2) + spindles, not the application's whim.
-
Evaluate partitioning for the largest tables
For tables exceeding 100M rows or with clear time-based access patterns, evaluate range partitioning. Partition pruning turns scans of a year of data into scans of a month — but only if every hot query's predicate aligns with the partition key.
Collects list -
Set alerts on key performance counters
Add or refresh thresholds on buffer cache hit ratio, deadlocks/sec, long-running queries (>30s), replication lag, and tempdb / WAL growth. Route to PagerDuty or Opsgenie with severity tied to whether it's degrading users.
Change Rollout and Validation
-
Submit change request to CAB
Document the proposed changes, blast radius, rollback plan, and expected gain. Schedule against the maintenance window; index rebuilds and partitioning are not standard changes and require explicit CAB approval.
-
Apply changes during the maintenance window
Apply changes in the approved order with checkpoints between each. Stop and roll back if any post-change probe regresses against baseline by more than the agreed threshold (typically 10%).
-
Compare post-change metrics against baseline
Pull the same 24-hour window of metrics captured in baseline. Confirm p95 latency, top-query total time, and wait-stat distribution all moved the right direction. Regressions go on a follow-up ticket; do not silently accept them.
Collects list Collects number Collects signature -
Roll back the change set
Execute the documented rollback: drop new indexes, restore prior config values, revert query changes via deploy. File an incident review note describing what regressed and why the pre-change validation missed it.
Backup and Recovery Verification
-
Confirm backups still meet RPO after schema changes
Index rebuilds and partitioning can grow the WAL / transaction log dramatically. Verify backup windows still complete on time and that log shipping or replication has caught up before declaring the tuning pass complete.
-
Run a restore drill against a non-prod target
Restore the most recent backup to an isolated environment and validate row counts on the changed tables. A green backup job is not a working backup — only a successful restore proves it.
Collects list -
Update runbook and tuning log
Record what was changed, why, the measured before/after, and any follow-up items in the team's runbook (Hudu, IT Glue, Confluence). The next DBA on this database should be able to read the trail without asking you.
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Database Performance Tuning Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.