Performance Testing Checklist

Test Planning

    Pin concrete numbers: p95 latency under 400ms, error rate under 0.5%, target throughput in RPS, max concurrent users. Vague goals like 'should be fast' produce vague test reports. Pull current SLOs from the service catalog or last quarter's error budget review.

    Match instance sizes, autoscaling configs, RDS class, cache tier, and network topology. A perf test against a single t3.medium tells you nothing about a production ASG of m6i.2xlarge. Document any deltas between staging and prod that the report will need to caveat.

    Pull last 30 days of production traffic from Datadog or CloudWatch. Identify the top 5-10 endpoints by volume and their relative weights. Peak hour matters more than daily average for capacity planning.

    Plan covers scope, in/out-of-scope endpoints, test types (load, stress, spike, endurance), tooling (k6, JMeter, Locust, Gatling), schedule, and rollback criteria. Circulate to engineering manager and SRE for sign-off before scripting starts.

Test Design and Scripting

    One script per workload scenario — checkout flow, search, login burst. Parameterize ramp-up, VU count, and duration so the same script runs as load, stress, or spike with different env vars. Commit scripts to the perf-tests repo, not someone's laptop.

    Never run perf tests against real customer PII — hash emails, randomize names, scrub payment data. If pulling a prod snapshot, run the anonymization job before loading into the perf environment. GDPR and SOC 2 auditors flag any prod-data leakage to non-prod.

    Run each script at 5-10 virtual users for 5 minutes. Confirm assertions pass, response codes match expectations, and no scripts have hardcoded credentials or environment URLs. Catches 90% of script bugs before the full run.

Test Execution

    Single VU or low concurrency for 30 minutes to capture per-request latency without contention. This is the floor against which load and stress numbers get compared. Save the run ID for the report.

    Load = expected peak. Stress = ramp until failure to find the breaking point. Spike = instantaneous 5-10x burst (Black Friday pattern). Endurance = sustained load for 4-8 hours to surface memory leaks and connection-pool exhaustion. Don't skip endurance — it's where slow leaks hide.

    Have the Datadog or Grafana dashboard open during execution — golden signals (latency, traffic, errors, saturation) plus DB connection pool, JVM heap or Go GC pauses, and network throughput. Saturation tells you which resource is the bottleneck before the latency numbers do.

    Export raw results from k6/JMeter, screenshot the relevant Grafana panels, and save APM trace samples for the slowest 1% of requests. Without artifacts the report is unreproducible.

Results Analysis

    Walk each measured metric (p50, p95, p99, error rate, throughput) against the targets from the planning step. Flag any metric outside threshold; identify whether the miss is a regression versus prior runs or a new behavior.

    Pull the slowest traces from Datadog APM, New Relic, or Honeycomb. Common culprits: N+1 queries, missing index, connection pool sized too small, synchronous external API call without timeout, GC pauses on undersized heap.

    Report in Confluence or Notion: scope, environment, scenarios, results table vs. targets, identified bottlenecks, recommended actions, and link to artifacts. Tag the engineering manager and SRE on-call.

Optimization and Tuning

    One Jira or Linear ticket per bottleneck with the trace link, reproducer, and target metric. Avoid the trap of fixing 'performance' as a single epic — concrete tickets get done; vague initiatives don't.

    Common fixes: add database index (use CREATE INDEX CONCURRENTLY in Postgres to avoid table lock), bump connection pool size, add Redis cache layer, batch external calls, raise JVM heap or tune GC. Land each fix as a separate PR so you can attribute the improvement to the change.

    Same scripts, same environment, same workload mix. The only variable should be the code under test. Capture the delta vs. the original run for each metric.

    Record what config values changed, what code shipped, and what the before/after numbers are. SOC 2 change-management auditors ask for this; future engineers debugging perf in 6 months will thank you.

Final Validation and Sign-Off

    Full load + endurance run against the final candidate build. No code changes after this point — anything new restarts the validation cycle.

    Note new scenarios discovered, scripts that need maintenance, and threshold adjustments for next cycle. The next perf cycle starts from this updated plan, not from scratch.

    Short summary to engineering manager, product, and SRE on-call: pass/fail, residual risks, headroom at expected peak, recommended capacity for launch. Include a link to the full report.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Software Development Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack