Automated Testing Checklist
Steps a QA lead or SDET runs to plan, build, execute, and maintain an automated test suite for a SaaS application — from scoping critical paths through CI integration, flake triage, and ongoing coverage review.
Test Planning
-
Define scope and exit criteria for the suite
The QA lead writes down what the suite is responsible for proving — happy-path checkout, auth flows, billing webhooks — and what is explicitly out of scope (load tests, visual regression). Define exit criteria using definition of done: e.g., all P0 user paths covered, suite runs under 15 minutes, flake rate below 2%.
-
Select frameworks aligned to the stack
Match the runner to the language and surface: Playwright or Cypress for browser e2e, Jest/Vitest for JS unit, pytest for Python, RSpec for Ruby, k6 or Locust for load. Avoid mixing two e2e frameworks — pick one and commit; cross-framework debugging is the most common reason teams abandon automation.
-
Identify P0 user paths to automate first
Pull the top 10 user flows from product analytics (Amplitude, Mixpanel, or your warehouse). Sign-up, login, primary CRUD operation, payment, and any flow tied to a customer SLA belong in the first batch. Capture the list as the input to test-case writing.
Collects paragraph -
Set the target flake-rate and runtime budget
Pick numbers the team will defend at retro: e.g., e2e suite under 15 min, unit suite under 3 min, flake rate under 2% measured over the last 100 runs. Without explicit budgets, suites drift to 45 minutes and engineers stop trusting them.
Test Environment Setup
-
Provision an isolated staging environment
The platform engineer stands up a staging tier mirroring production — same Postgres major version, same Redis topology, same feature flags defaulted off. Ephemeral per-PR environments (Vercel preview, Render preview, or k8s namespace per branch) are ideal; a single shared staging is acceptable but tests will collide.
-
Wire the suite into the CI pipeline
Add the test job to GitHub Actions, GitLab CI, CircleCI, or Buildkite. Make it a required status check on the main branch ruleset so PRs cannot merge with red CI — the most common automation failure mode is engineers normalizing red builds.
-
Seed deterministic test fixtures
Use factories (FactoryBot, factory_boy, Fishery) or fixture files checked into the repo. Never depend on records that exist in shared staging — they get mutated, deleted, or rotated. Each test should create the data it needs and tear down after itself.
-
Configure secrets via CI secret store
API keys for Stripe test mode, Twilio test creds, OAuth client secrets — store in GitHub Actions secrets, AWS Secrets Manager, or Vault. Never commit a `.env.test` with real keys; gitleaks or trufflehog as a pre-commit hook catches the slip.
Writing Test Cases
-
Adopt a naming convention for test files
Document the convention in CONTRIBUTING.md: e.g., `feature.behavior.spec.ts` for e2e, `module.test.ts` for unit. The test description should read as a sentence — `it('rejects expired JWTs at /api/v2/orders')` — so failures in CI are self-explanatory without opening the file.
-
Write tests as independent and idempotent
No test should depend on the order another ran in. Use `beforeEach` to set up state and `afterEach` (or DB transaction rollback) to tear down. Order-dependent suites break the moment you parallelize across runners.
-
Mock third-party APIs at the network boundary
Use msw, nock, or VCR cassettes for HTTP. Tests that hit real Stripe, SendGrid, or partner APIs are the top source of flakes — rate limits, network blips, and sandbox outages all show up as red CI. Reserve real-API tests for a nightly contract-test job, not the PR pipeline.
-
Add structured logging inside test helpers
When an e2e test fails in CI without local repro, the screenshot + DOM snapshot + network HAR + console log is what makes triage tractable. Playwright's trace viewer and Cypress's video output are the baseline; configure them to upload as CI artifacts on failure.
Test Execution
-
Run the suite on every pull request
Configure CI to trigger on `pull_request` and `push` to main. Split the matrix: unit tests on every PR, full e2e on PRs touching the relevant package, nightly cron for the whole suite against staging. Required status checks block merge until green.
-
Run UI tests headless with parallel sharding
Playwright supports `--shard 1/4` natively; Cypress has parallelization via Cypress Cloud or Sorry-Cypress. Headless Chromium is roughly 2x faster than headed and behaves identically for 95% of tests. Reserve headed mode for local debugging only.
-
Track per-test runtime and flake rate
Tools like Datadog CI Visibility, BuildPulse, or Trunk Flaky Tests ingest JUnit XML and surface the slowest 1% and the flakiest tests. Without this data, the suite degrades silently — engineers blame infra rather than spotting the one test that fails 1 in 30 runs.
Collects number
Results Analysis and Reporting
-
Publish a pass-rate dashboard
A Datadog, Grafana, or Honeycomb dashboard showing pass rate, p95 runtime, and top 10 flaky tests over a 14-day window. Link it from the team's README so it's reviewed at every retro, not just when CI is on fire.
-
Route failure alerts to the right Slack channel
Main-branch red builds page the on-call engineer; PR failures notify the PR author only. Avoid the anti-pattern of routing every failure to #engineering — within a week the channel becomes ignorable noise.
-
Decide whether the suite meets the flake budget
Compare the flake rate captured earlier against the budget set in planning. If over budget, trigger the remediation phase — the team must quarantine the worst offenders before adding new tests.
Collects list -
Quarantine the top flaky tests
Tag the offenders `@quarantine` so they run but don't block merge, file tickets to fix or delete within 2 sprints, and remove the tag once the underlying race condition or timing issue is resolved. A quarantine list with no expiration is just hidden tech debt.
Maintenance and Optimization
-
Update tests when feature flags flip to GA
When a flag goes 100% on, the assertions guarded by `if (flag)` need to become unconditional, and the off-path test deletes. Stale flag-gated tests are a major source of dead-code accumulation called out in the team's quarterly flag review.
-
Refactor duplicated page-object helpers
Sprint-by-sprint, page objects accrete copy-pasted login helpers and selector duplication. Schedule a quarterly refactor pass: consolidate selectors into one place per page, replace CSS selectors with `data-testid`, delete dead helpers identified by ts-prune or similar.
-
Review coverage against P0 user paths
Use Codecov, Coveralls, or `nyc`/`coverage.py` reports to map line coverage against the P0 path list captured in planning. Aim for 80%+ branch coverage on the critical-path modules; chasing 100% repo-wide buys little and incentivizes assertion-free tests.
Collects number Collects paragraph Collects file