Performance Testing Checklist
Test Planning
Pin concrete numbers: p95 latency under 400ms, error rate under 0.5%, target throughput in RPS, max concurrent users. Vague goals like 'should be fast' produce vague test reports. Pull current SLOs from the service catalog or last quarter's error budget review.
Match instance sizes, autoscaling configs, RDS class, cache tier, and network topology. A perf test against a single t3.medium tells you nothing about a production ASG of m6i.2xlarge. Document any deltas between staging and prod that the report will need to caveat.
Pull last 30 days of production traffic from Datadog or CloudWatch. Identify the top 5-10 endpoints by volume and their relative weights. Peak hour matters more than daily average for capacity planning.
Plan covers scope, in/out-of-scope endpoints, test types (load, stress, spike, endurance), tooling (k6, JMeter, Locust, Gatling), schedule, and rollback criteria. Circulate to engineering manager and SRE for sign-off before scripting starts.
Test Design and Scripting
One script per workload scenario — checkout flow, search, login burst. Parameterize ramp-up, VU count, and duration so the same script runs as load, stress, or spike with different env vars. Commit scripts to the perf-tests repo, not someone's laptop.
Never run perf tests against real customer PII — hash emails, randomize names, scrub payment data. If pulling a prod snapshot, run the anonymization job before loading into the perf environment. GDPR and SOC 2 auditors flag any prod-data leakage to non-prod.
Run each script at 5-10 virtual users for 5 minutes. Confirm assertions pass, response codes match expectations, and no scripts have hardcoded credentials or environment URLs. Catches 90% of script bugs before the full run.
Test Execution
Single VU or low concurrency for 30 minutes to capture per-request latency without contention. This is the floor against which load and stress numbers get compared. Save the run ID for the report.
Load = expected peak. Stress = ramp until failure to find the breaking point. Spike = instantaneous 5-10x burst (Black Friday pattern). Endurance = sustained load for 4-8 hours to surface memory leaks and connection-pool exhaustion. Don't skip endurance — it's where slow leaks hide.
Have the Datadog or Grafana dashboard open during execution — golden signals (latency, traffic, errors, saturation) plus DB connection pool, JVM heap or Go GC pauses, and network throughput. Saturation tells you which resource is the bottleneck before the latency numbers do.
Export raw results from k6/JMeter, screenshot the relevant Grafana panels, and save APM trace samples for the slowest 1% of requests. Without artifacts the report is unreproducible.
Results Analysis
Walk each measured metric (p50, p95, p99, error rate, throughput) against the targets from the planning step. Flag any metric outside threshold; identify whether the miss is a regression versus prior runs or a new behavior.
Pull the slowest traces from Datadog APM, New Relic, or Honeycomb. Common culprits: N+1 queries, missing index, connection pool sized too small, synchronous external API call without timeout, GC pauses on undersized heap.
Report in Confluence or Notion: scope, environment, scenarios, results table vs. targets, identified bottlenecks, recommended actions, and link to artifacts. Tag the engineering manager and SRE on-call.
Optimization and Tuning
One Jira or Linear ticket per bottleneck with the trace link, reproducer, and target metric. Avoid the trap of fixing 'performance' as a single epic — concrete tickets get done; vague initiatives don't.
Common fixes: add database index (use CREATE INDEX CONCURRENTLY in Postgres to avoid table lock), bump connection pool size, add Redis cache layer, batch external calls, raise JVM heap or tune GC. Land each fix as a separate PR so you can attribute the improvement to the change.
Same scripts, same environment, same workload mix. The only variable should be the code under test. Capture the delta vs. the original run for each metric.
Record what config values changed, what code shipped, and what the before/after numbers are. SOC 2 change-management auditors ask for this; future engineers debugging perf in 6 months will thank you.
Final Validation and Sign-Off
Full load + endurance run against the final candidate build. No code changes after this point — anything new restarts the validation cycle.
Note new scenarios discovered, scripts that need maintenance, and threshold adjustments for next cycle. The next perf cycle starts from this updated plan, not from scratch.
Short summary to engineering manager, product, and SRE on-call: pass/fail, residual risks, headroom at expected peak, recommended capacity for launch. Include a link to the full report.
Use this template in Manifestly
- Backup and Recovery Checklist
- New Developer Onboarding Checklist
- User Acceptance Testing Checklist
- Backlog Prioritization Checklist
- Unit Testing Checklist
- Release Planning Checklist
- Software Project Management Checklist
- Software Engineer Hiring Checklist
- Peer Review Onboarding Checklist
- Change Management Checklist
- Security Review Checklist
- Version Control Checklist
- Project Closure Checklist
- Technical Debt Management Checklist
- Software Licensing Compliance Checklist
- Sprint Planning Checklist
- Prototype Review Checklist
- Requirement Gathering Checklist
- Employee Data Security Checklist
- End-User Documentation Checklist
- CI/CD Pipeline Review Checklist
- Engineering Team Building Activity Checklist
- Employee Offboarding Checklist
- Design Documentation Checklist
- Quality Assurance Checklist
- Code Review Checklist
- Release Notes Checklist
- Engineering Resource Allocation Checklist
- Code Review Checklist
- Bug Tracking and Resolution Checklist
- Monitoring Setup Checklist
- Feature Development Checklist
- Acceptance Testing Checklist
- Testing Environment Setup Checklist
- Test Case Review Checklist
- Performance Monitoring Checklist
- Post-Deployment Testing Checklist
- Performance Optimization Checklist
- Test Plan Checklist
- API Development Checklist
- Security Best Practices Checklist
- Software Development Plan Checklist
- Disaster Recovery Plan Checklist
- Database Design Checklist
- Engineer Offboarding Checklist
- Refactoring Checklist
- Incident Response Checklist
- Software Engineer Onboarding Checklist
- Project Review and Retrospective Checklist
- System Testing Checklist
- Software Architecture Design Checklist
- Deployment Checklist
- Development Environment Setup Checklist
- Rollback Plan Checklist
- Automated Testing Checklist
- Software Update Checklist
- Integration Testing Checklist
- New Engineer Onboarding Checklist
- Technical Documentation Checklist
- Software Project Risk Management Checklist
- Software Project Initiation Checklist
- Personal Development Plan (PDP) Checklist
- User Acceptance Testing (UAT) Checklist
- Deployment Plan Checklist
- Release Checklist
- API Documentation Checklist
- QA Testing Checklist
- Service Level Agreement (SLA) Checklist
- User Acceptance Testing Checklist
- Accessibility Compliance Checklist
- Quality Assurance Checklist
- Bug Tracking and Resolution Checklist
- Acceptance Testing Checklist
- Testing Environment Setup Checklist
- Test Case Review Checklist
- Post-Deployment Testing Checklist
- Test Plan Checklist
- Accessibility Standards Compliance Checklist
- System Testing Checklist
- Automated Testing Checklist
- Integration Testing Checklist
- User Acceptance Testing (UAT) Checklist
- Regression Testing Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
