Automated Testing Checklist

Steps a QA lead or SDET runs to plan, build, execute, and maintain an automated test suite for a SaaS application — from scoping critical paths through CI integration, flake triage, and ongoing coverage review.

1

Test Planning

  1. Define scope and exit criteria for the suite
    • The QA lead writes down what the suite is responsible for proving — happy-path checkout, auth flows, billing webhooks — and what is explicitly out of scope (load tests, visual regression). Define exit criteria using definition of done: e.g., all P0 user paths covered, suite runs under 15 minutes, flake rate below 2%.

  2. Select frameworks aligned to the stack
    • Match the runner to the language and surface: Playwright or Cypress for browser e2e, Jest/Vitest for JS unit, pytest for Python, RSpec for Ruby, k6 or Locust for load. Avoid mixing two e2e frameworks — pick one and commit; cross-framework debugging is the most common reason teams abandon automation.

  3. Identify P0 user paths to automate first
    • Pull the top 10 user flows from product analytics (Amplitude, Mixpanel, or your warehouse). Sign-up, login, primary CRUD operation, payment, and any flow tied to a customer SLA belong in the first batch. Capture the list as the input to test-case writing.

    Collects paragraph
  4. Set the target flake-rate and runtime budget
    • Pick numbers the team will defend at retro: e.g., e2e suite under 15 min, unit suite under 3 min, flake rate under 2% measured over the last 100 runs. Without explicit budgets, suites drift to 45 minutes and engineers stop trusting them.

2

Test Environment Setup

  1. Provision an isolated staging environment
    • The platform engineer stands up a staging tier mirroring production — same Postgres major version, same Redis topology, same feature flags defaulted off. Ephemeral per-PR environments (Vercel preview, Render preview, or k8s namespace per branch) are ideal; a single shared staging is acceptable but tests will collide.

  2. Wire the suite into the CI pipeline
    • Add the test job to GitHub Actions, GitLab CI, CircleCI, or Buildkite. Make it a required status check on the main branch ruleset so PRs cannot merge with red CI — the most common automation failure mode is engineers normalizing red builds.

  3. Seed deterministic test fixtures
    • Use factories (FactoryBot, factory_boy, Fishery) or fixture files checked into the repo. Never depend on records that exist in shared staging — they get mutated, deleted, or rotated. Each test should create the data it needs and tear down after itself.

  4. Configure secrets via CI secret store
    • API keys for Stripe test mode, Twilio test creds, OAuth client secrets — store in GitHub Actions secrets, AWS Secrets Manager, or Vault. Never commit a `.env.test` with real keys; gitleaks or trufflehog as a pre-commit hook catches the slip.

3

Writing Test Cases

  1. Adopt a naming convention for test files
    • Document the convention in CONTRIBUTING.md: e.g., `feature.behavior.spec.ts` for e2e, `module.test.ts` for unit. The test description should read as a sentence — `it('rejects expired JWTs at /api/v2/orders')` — so failures in CI are self-explanatory without opening the file.

  2. Write tests as independent and idempotent
    • No test should depend on the order another ran in. Use `beforeEach` to set up state and `afterEach` (or DB transaction rollback) to tear down. Order-dependent suites break the moment you parallelize across runners.

  3. Mock third-party APIs at the network boundary
    • Use msw, nock, or VCR cassettes for HTTP. Tests that hit real Stripe, SendGrid, or partner APIs are the top source of flakes — rate limits, network blips, and sandbox outages all show up as red CI. Reserve real-API tests for a nightly contract-test job, not the PR pipeline.

  4. Add structured logging inside test helpers
    • When an e2e test fails in CI without local repro, the screenshot + DOM snapshot + network HAR + console log is what makes triage tractable. Playwright's trace viewer and Cypress's video output are the baseline; configure them to upload as CI artifacts on failure.

4

Test Execution

  1. Run the suite on every pull request
    • Configure CI to trigger on `pull_request` and `push` to main. Split the matrix: unit tests on every PR, full e2e on PRs touching the relevant package, nightly cron for the whole suite against staging. Required status checks block merge until green.

  2. Run UI tests headless with parallel sharding
    • Playwright supports `--shard 1/4` natively; Cypress has parallelization via Cypress Cloud or Sorry-Cypress. Headless Chromium is roughly 2x faster than headed and behaves identically for 95% of tests. Reserve headed mode for local debugging only.

  3. Track per-test runtime and flake rate
    • Tools like Datadog CI Visibility, BuildPulse, or Trunk Flaky Tests ingest JUnit XML and surface the slowest 1% and the flakiest tests. Without this data, the suite degrades silently — engineers blame infra rather than spotting the one test that fails 1 in 30 runs.

    Collects number
5

Results Analysis and Reporting

  1. Publish a pass-rate dashboard
    • A Datadog, Grafana, or Honeycomb dashboard showing pass rate, p95 runtime, and top 10 flaky tests over a 14-day window. Link it from the team's README so it's reviewed at every retro, not just when CI is on fire.

  2. Route failure alerts to the right Slack channel
    • Main-branch red builds page the on-call engineer; PR failures notify the PR author only. Avoid the anti-pattern of routing every failure to #engineering — within a week the channel becomes ignorable noise.

  3. Decide whether the suite meets the flake budget
    • Compare the flake rate captured earlier against the budget set in planning. If over budget, trigger the remediation phase — the team must quarantine the worst offenders before adding new tests.

    Collects list
  4. Quarantine the top flaky tests
    • Tag the offenders `@quarantine` so they run but don't block merge, file tickets to fix or delete within 2 sprints, and remove the tag once the underlying race condition or timing issue is resolved. A quarantine list with no expiration is just hidden tech debt.

6

Maintenance and Optimization

  1. Update tests when feature flags flip to GA
    • When a flag goes 100% on, the assertions guarded by `if (flag)` need to become unconditional, and the off-path test deletes. Stale flag-gated tests are a major source of dead-code accumulation called out in the team's quarterly flag review.

  2. Refactor duplicated page-object helpers
    • Sprint-by-sprint, page objects accrete copy-pasted login helpers and selector duplication. Schedule a quarterly refactor pass: consolidate selectors into one place per page, replace CSS selectors with `data-testid`, delete dead helpers identified by ts-prune or similar.

  3. Review coverage against P0 user paths
    • Use Codecov, Coveralls, or `nyc`/`coverage.py` reports to map line coverage against the P0 path list captured in planning. Aim for 80%+ branch coverage on the critical-path modules; chasing 100% repo-wide buys little and incentivizes assertion-free tests.

    Collects number Collects paragraph Collects file