Performance Optimization Checklist

Baseline and Profiling

    Pull latency from Datadog / New Relic / Honeycomb for the last 14 days, broken out by the top 5 endpoints by traffic. Record traffic volume and error rate alongside latency — optimizing p99 on a low-traffic endpoint is rarely worth the effort.

    Use pprof, py-spy, async-profiler, or the equivalent for your runtime. Capture a 60-second profile under representative load — not idle. Flame graphs from production traffic catch hot paths that synthetic benchmarks miss.

    Rank by share of total wall-clock time, not by intuition. Common surprises: JSON serialization, ORM N+1 queries, synchronous logging in hot paths, gzip on already-compressed payloads.

    Write the goal as a measurable SLI: "p95 checkout latency under 400ms over 7-day window." Vague targets like "make it faster" produce vague results. Get sign-off from the product manager so the team isn't optimizing in the dark.

Application Code Optimization

    Use the ORM's eager-loading (Rails includes, Django select_related/prefetch_related, SQLAlchemy joinedload) or write the join explicitly. Add bullet-train or Bullet (Rails) / nplusone (Python) to CI to fail builds that introduce regressions.

    Loops that hit external APIs sequentially are a common culprit. Batch through DataLoader, asyncio.gather, or a background job (Sidekiq, Celery, BullMQ) when the work doesn't need to block the request.

    Cache results of pure deterministic functions in-process (functools.lru_cache, lodash memoize) or in Redis for cross-process. Watch the cache key — including a mutable object as a key is a bug magnet.

    Keep PRs under ~400 lines so reviewers actually read them. Include before/after profile snapshots in the PR description. Route through CODEOWNERS to engineers who own the touched modules.

Database Performance

    In Postgres enable pg_stat_statements; in MySQL enable the slow query log with long_query_time=0.5. Sort by total time (calls × mean), not single-call time — the query running 100k times at 50ms is more important than the one running once at 8s.

    Look for sequential scans on large tables, hash joins where a nested-loop with an index would be faster, and rows-removed-by-filter numbers that dwarf rows-returned. Paste plans into explain.depesz.com or pev2 for readability.

    Use CREATE INDEX CONCURRENTLY in Postgres and ALGORITHM=INPLACE / pt-online-schema-change in MySQL. Plain CREATE INDEX takes an exclusive lock and will pause writes on a busy table for the duration.

    Verify with EXPLAIN that the planner actually picks up the new index — composite index column order matters.

    For Postgres, use PgBouncer in transaction-pooling mode in front of the application. Set a server-side statement_timeout so a runaway query can't tie up a connection forever. Right-size pool size to (cores × 2) + effective_io, not arbitrarily large.

    If a query joins five tables on every request, a denormalized read model or materialized view may be the right fix. Backfill in batches with sleeps so replication lag stays under your alert threshold; do not run a single transaction over millions of rows.

Caching and Infrastructure

    Cache at the read-model boundary, not inside the ORM. Use cache-aside with a sensible TTL plus explicit invalidation on writes. The two hard problems are still cache invalidation and naming things — write down your key schema before shipping.

    Set long Cache-Control: max-age values for fingerprinted assets, short or no-cache for HTML. Verify with curl -I against CloudFront / Cloudflare / Fastly that you see X-Cache: Hit on second request.

    Default HPA at 80% CPU is often too high for latency-sensitive services — saturation creates queueing well before CPU pegs. Consider scaling on p95 latency or request concurrency instead. Ensure the cluster has burst headroom for sudden scale-ups.

    If the regression severity is moderate or severe, run a k6 or Locust load test in staging before shipping. For minor SLO-internal optimizations, canary in production is usually sufficient.

    Replay realistic traffic shapes from production logs — synthetic uniform load misses tail behaviors. Hold for at least 30 minutes at target RPS to surface memory leaks, GC pauses, and connection-pool exhaustion. Compare p95/p99 against the SLO defined earlier.

Frontend and Network

    Run Lighthouse against production from a throttled mobile profile, not your fiber-connected laptop. Track LCP, INP, and CLS — Google's thresholds are 2.5s / 200ms / 0.1. Field data from CrUX is more honest than lab data.

    Use dynamic import() in webpack/Vite/Next.js for non-critical routes and components. Defer below-the-fold images with native loading="lazy". Watch the bundle analyzer — a single 2MB lodash import in the main chunk is a common find.

    Most CDNs support HTTP/3 (QUIC) with a flag flip. The win is largest on lossy mobile networks. Confirm with curl --http3 or Chrome DevTools Network panel that the protocol is actually negotiated.

    Enable OCSP stapling and TLS 1.3 0-RTT where session resumption is safe. Use a low-latency DNS provider (Route53, NS1, Cloudflare) and short TTLs only where needed — over-aggressive TTLs hurt more than they help.

Validation and Ongoing Monitoring

    Watch error rate and p99 on the canary fleet for at least 30 minutes before progressing. Have the rollback path tested and a kill-switch feature flag ready — performance regressions sometimes only show up under real production cardinality.

    Pull the same dashboard captured at baseline. The optimization is only "done" if the SLI defined in the kickoff actually moved — declaring victory off a green CI run is how regressions ship.

    Pin the optimization in place with a benchmark gate in GitHub Actions / GitLab CI. Bundle-size budgets (size-limit), Lighthouse CI, or a k6 smoke test on every PR — pick the layer that matched the win.

    Schedule a blameless review focused on why the regression went undetected for so long — alert tuning, missing dashboard, or a coverage gap in synthetic checks. Track action items to closure in Jira / Linear with named owners.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Software Development Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack