Performance Optimization Checklist
Baseline and Profiling
Pull latency from Datadog / New Relic / Honeycomb for the last 14 days, broken out by the top 5 endpoints by traffic. Record traffic volume and error rate alongside latency — optimizing p99 on a low-traffic endpoint is rarely worth the effort.
Use pprof, py-spy, async-profiler, or the equivalent for your runtime. Capture a 60-second profile under representative load — not idle. Flame graphs from production traffic catch hot paths that synthetic benchmarks miss.
Rank by share of total wall-clock time, not by intuition. Common surprises: JSON serialization, ORM N+1 queries, synchronous logging in hot paths, gzip on already-compressed payloads.
Write the goal as a measurable SLI: "p95 checkout latency under 400ms over 7-day window." Vague targets like "make it faster" produce vague results. Get sign-off from the product manager so the team isn't optimizing in the dark.
Application Code Optimization
Use the ORM's eager-loading (Rails includes, Django select_related/prefetch_related, SQLAlchemy joinedload) or write the join explicitly. Add bullet-train or Bullet (Rails) / nplusone (Python) to CI to fail builds that introduce regressions.
Loops that hit external APIs sequentially are a common culprit. Batch through DataLoader, asyncio.gather, or a background job (Sidekiq, Celery, BullMQ) when the work doesn't need to block the request.
Cache results of pure deterministic functions in-process (functools.lru_cache, lodash memoize) or in Redis for cross-process. Watch the cache key — including a mutable object as a key is a bug magnet.
Keep PRs under ~400 lines so reviewers actually read them. Include before/after profile snapshots in the PR description. Route through CODEOWNERS to engineers who own the touched modules.
Database Performance
In Postgres enable pg_stat_statements; in MySQL enable the slow query log with long_query_time=0.5. Sort by total time (calls × mean), not single-call time — the query running 100k times at 50ms is more important than the one running once at 8s.
Look for sequential scans on large tables, hash joins where a nested-loop with an index would be faster, and rows-removed-by-filter numbers that dwarf rows-returned. Paste plans into explain.depesz.com or pev2 for readability.
Use CREATE INDEX CONCURRENTLY in Postgres and ALGORITHM=INPLACE / pt-online-schema-change in MySQL. Plain CREATE INDEX takes an exclusive lock and will pause writes on a busy table for the duration.
Verify with EXPLAIN that the planner actually picks up the new index — composite index column order matters.
For Postgres, use PgBouncer in transaction-pooling mode in front of the application. Set a server-side statement_timeout so a runaway query can't tie up a connection forever. Right-size pool size to (cores × 2) + effective_io, not arbitrarily large.
If a query joins five tables on every request, a denormalized read model or materialized view may be the right fix. Backfill in batches with sleeps so replication lag stays under your alert threshold; do not run a single transaction over millions of rows.
Caching and Infrastructure
Cache at the read-model boundary, not inside the ORM. Use cache-aside with a sensible TTL plus explicit invalidation on writes. The two hard problems are still cache invalidation and naming things — write down your key schema before shipping.
Set long Cache-Control: max-age values for fingerprinted assets, short or no-cache for HTML. Verify with curl -I against CloudFront / Cloudflare / Fastly that you see X-Cache: Hit on second request.
Default HPA at 80% CPU is often too high for latency-sensitive services — saturation creates queueing well before CPU pegs. Consider scaling on p95 latency or request concurrency instead. Ensure the cluster has burst headroom for sudden scale-ups.
If the regression severity is moderate or severe, run a k6 or Locust load test in staging before shipping. For minor SLO-internal optimizations, canary in production is usually sufficient.
Replay realistic traffic shapes from production logs — synthetic uniform load misses tail behaviors. Hold for at least 30 minutes at target RPS to surface memory leaks, GC pauses, and connection-pool exhaustion. Compare p95/p99 against the SLO defined earlier.
Frontend and Network
Run Lighthouse against production from a throttled mobile profile, not your fiber-connected laptop. Track LCP, INP, and CLS — Google's thresholds are 2.5s / 200ms / 0.1. Field data from CrUX is more honest than lab data.
Use dynamic import() in webpack/Vite/Next.js for non-critical routes and components. Defer below-the-fold images with native loading="lazy". Watch the bundle analyzer — a single 2MB lodash import in the main chunk is a common find.
Most CDNs support HTTP/3 (QUIC) with a flag flip. The win is largest on lossy mobile networks. Confirm with curl --http3 or Chrome DevTools Network panel that the protocol is actually negotiated.
Enable OCSP stapling and TLS 1.3 0-RTT where session resumption is safe. Use a low-latency DNS provider (Route53, NS1, Cloudflare) and short TTLs only where needed — over-aggressive TTLs hurt more than they help.
Validation and Ongoing Monitoring
Watch error rate and p99 on the canary fleet for at least 30 minutes before progressing. Have the rollback path tested and a kill-switch feature flag ready — performance regressions sometimes only show up under real production cardinality.
Pull the same dashboard captured at baseline. The optimization is only "done" if the SLI defined in the kickoff actually moved — declaring victory off a green CI run is how regressions ship.
Pin the optimization in place with a benchmark gate in GitHub Actions / GitLab CI. Bundle-size budgets (size-limit), Lighthouse CI, or a k6 smoke test on every PR — pick the layer that matched the win.
Schedule a blameless review focused on why the regression went undetected for so long — alert tuning, missing dashboard, or a coverage gap in synthetic checks. Track action items to closure in Jira / Linear with named owners.
Use this template in Manifestly
- Backup and Recovery Checklist
- New Developer Onboarding Checklist
- User Acceptance Testing Checklist
- Backlog Prioritization Checklist
- Unit Testing Checklist
- Release Planning Checklist
- Software Project Management Checklist
- Software Engineer Hiring Checklist
- Peer Review Onboarding Checklist
- Change Management Checklist
- Security Review Checklist
- Version Control Checklist
- Project Closure Checklist
- Technical Debt Management Checklist
- Software Licensing Compliance Checklist
- Sprint Planning Checklist
- Prototype Review Checklist
- Requirement Gathering Checklist
- Employee Data Security Checklist
- End-User Documentation Checklist
- CI/CD Pipeline Review Checklist
- Engineering Team Building Activity Checklist
- Employee Offboarding Checklist
- Design Documentation Checklist
- Quality Assurance Checklist
- Code Review Checklist
- Release Notes Checklist
- Engineering Resource Allocation Checklist
- Code Review Checklist
- Bug Tracking and Resolution Checklist
- Monitoring Setup Checklist
- Feature Development Checklist
- Acceptance Testing Checklist
- Testing Environment Setup Checklist
- Test Case Review Checklist
- Performance Monitoring Checklist
- Post-Deployment Testing Checklist
- Test Plan Checklist
- API Development Checklist
- Security Best Practices Checklist
- Software Development Plan Checklist
- Disaster Recovery Plan Checklist
- Database Design Checklist
- Engineer Offboarding Checklist
- Refactoring Checklist
- Incident Response Checklist
- Software Engineer Onboarding Checklist
- Project Review and Retrospective Checklist
- System Testing Checklist
- Software Architecture Design Checklist
- Deployment Checklist
- Development Environment Setup Checklist
- Rollback Plan Checklist
- Automated Testing Checklist
- Performance Testing Checklist
- Software Update Checklist
- Integration Testing Checklist
- New Engineer Onboarding Checklist
- Technical Documentation Checklist
- Software Project Risk Management Checklist
- Software Project Initiation Checklist
- Personal Development Plan (PDP) Checklist
- User Acceptance Testing (UAT) Checklist
- Deployment Plan Checklist
- Release Checklist
- API Documentation Checklist
- QA Testing Checklist
- Service Level Agreement (SLA) Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
