Performance Optimization Checklist

A structured pass for engineering teams investigating production latency, throughput, or cost regressions — covering profiling, database tuning, infrastructure, frontend, network, and ongoing monitoring. Run quarterly or whenever SLO burn-rate alerts cross the 14-day threshold.

6 sections 26 steps Collects data

Baseline and Profiling

Capture current p50, p95, p99 latency
- Pull latency from Datadog / New Relic / Honeycomb for the last 14 days, broken out by the top 5 endpoints by traffic. Record traffic volume and error rate alongside latency — optimizing p99 on a low-traffic endpoint is rarely worth the effort.
Collects list Collects file Collects paragraph
Run a CPU and memory profile
- Use pprof, py-spy, async-profiler, or the equivalent for your runtime. Capture a 60-second profile under representative load — not idle. Flame graphs from production traffic catch hot paths that synthetic benchmarks miss.
Identify the top three bottlenecks
- Rank by share of total wall-clock time, not by intuition. Common surprises: JSON serialization, ORM N+1 queries, synchronous logging in hot paths, gzip on already-compressed payloads.
Define target SLOs for the optimization pass
- Write the goal as a measurable SLI: "p95 checkout latency under 400ms over 7-day window." Vague targets like "make it faster" produce vague results. Get sign-off from the product manager so the team isn't optimizing in the dark.

Application Code Optimization

Eliminate N+1 queries on hot paths
- Use the ORM's eager-loading (Rails includes, Django select_related/prefetch_related, SQLAlchemy joinedload) or write the join explicitly. Add bullet-train or Bullet (Rails) / nplusone (Python) to CI to fail builds that introduce regressions.
Replace synchronous calls with batched or async patterns
- Loops that hit external APIs sequentially are a common culprit. Batch through DataLoader, asyncio.gather, or a background job (Sidekiq, Celery, BullMQ) when the work doesn't need to block the request.
Add memoization for expensive pure functions
- Cache results of pure deterministic functions in-process (functools.lru_cache, lodash memoize) or in Redis for cross-process. Watch the cache key — including a mutable object as a key is a bug magnet.
Open a code-review PR with profiler-driven changes
- Keep PRs under ~400 lines so reviewers actually read them. Include before/after profile snapshots in the PR description. Route through CODEOWNERS to engineers who own the touched modules.

Database Performance

Review slow query logs from the last 7 days
- In Postgres enable pg_stat_statements; in MySQL enable the slow query log with long_query_time=0.5. Sort by total time (calls × mean), not single-call time — the query running 100k times at 50ms is more important than the one running once at 8s.
Run EXPLAIN ANALYZE on the worst offenders
- Look for sequential scans on large tables, hash joins where a nested-loop with an index would be faster, and rows-removed-by-filter numbers that dwarf rows-returned. Paste plans into explain.depesz.com or pev2 for readability.
Add indexes concurrently in production
- Use CREATE INDEX CONCURRENTLY in Postgres and ALGORITHM=INPLACE / pt-online-schema-change in MySQL. Plain CREATE INDEX takes an exclusive lock and will pause writes on a busy table for the duration.
  Verify with EXPLAIN that the planner actually picks up the new index — composite index column order matters.
Tune connection pooling and statement timeouts
- For Postgres, use PgBouncer in transaction-pooling mode in front of the application. Set a server-side statement_timeout so a runaway query can't tie up a connection forever. Right-size pool size to (cores × 2) + effective_io, not arbitrarily large.
Plan a backfill or denormalization if needed
- If a query joins five tables on every request, a denormalized read model or materialized view may be the right fix. Backfill in batches with sleeps so replication lag stays under your alert threshold; do not run a single transaction over millions of rows.

Caching and Infrastructure

Add a Redis or Memcached layer for hot reads
- Cache at the read-model boundary, not inside the ORM. Use cache-aside with a sensible TTL plus explicit invalidation on writes. The two hard problems are still cache invalidation and naming things — write down your key schema before shipping.
Configure CDN caching for static assets
- Set long Cache-Control: max-age values for fingerprinted assets, short or no-cache for HTML. Verify with curl -I against CloudFront / Cloudflare / Fastly that you see X-Cache: Hit on second request.
Right-size autoscaling and HPA thresholds
- Default HPA at 80% CPU is often too high for latency-sensitive services — saturation creates queueing well before CPU pegs. Consider scaling on p95 latency or request concurrency instead. Ensure the cluster has burst headroom for sudden scale-ups.
Decide whether to escalate to a load test
- If the regression severity is moderate or severe, run a k6 or Locust load test in staging before shipping. For minor SLO-internal optimizations, canary in production is usually sufficient.
Collects list
Run k6 load test against staging
- Replay realistic traffic shapes from production logs — synthetic uniform load misses tail behaviors. Hold for at least 30 minutes at target RPS to surface memory leaks, GC pauses, and connection-pool exhaustion. Compare p95/p99 against the SLO defined earlier.
Collects list Collects file

Frontend and Network

Audit Core Web Vitals with Lighthouse
- Run Lighthouse against production from a throttled mobile profile, not your fiber-connected laptop. Track LCP, INP, and CLS — Google's thresholds are 2.5s / 200ms / 0.1. Field data from CrUX is more honest than lab data.
Code-split and lazy-load heavy routes
- Use dynamic import() in webpack/Vite/Next.js for non-critical routes and components. Defer below-the-fold images with native loading="lazy". Watch the bundle analyzer — a single 2MB lodash import in the main chunk is a common find.
Enable HTTP/2 or HTTP/3 at the edge
- Most CDNs support HTTP/3 (QUIC) with a flag flip. The win is largest on lossy mobile networks. Confirm with curl --http3 or Chrome DevTools Network panel that the protocol is actually negotiated.
Reduce TLS and DNS handshake time
- Enable OCSP stapling and TLS 1.3 0-RTT where session resumption is safe. Use a low-latency DNS provider (Route53, NS1, Cloudflare) and short TTLs only where needed — over-aggressive TTLs hurt more than they help.

Validation and Ongoing Monitoring

Canary deploy at 5 percent traffic
- Watch error rate and p99 on the canary fleet for at least 30 minutes before progressing. Have the rollback path tested and a kill-switch feature flag ready — performance regressions sometimes only show up under real production cardinality.
Compare post-deploy metrics against the baseline
- Pull the same dashboard captured at baseline. The optimization is only "done" if the SLI defined in the kickoff actually moved — declaring victory off a green CI run is how regressions ship.
Collects list Collects file
Add a perf regression test to CI
- Pin the optimization in place with a benchmark gate in GitHub Actions / GitLab CI. Bundle-size budgets (size-limit), Lighthouse CI, or a k6 smoke test on every PR — pick the layer that matched the win.
Open a follow-up incident review
- Schedule a blameless review focused on why the regression went undetected for so long — alert tuning, missing dashboard, or a coverage gap in synthetic checks. Track action items to closure in Jira / Linear with named owners.

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.

Use this workflow Start free trial

Sections 6

Steps 26

Category Software Development

Price Free to start

Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Related templates

More workflows your team can run.

Software Development

Run Performance Optimization Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.