Testing Environment Setup Checklist

Steps a platform or QA lead runs to stand up a new testing environment for a SaaS engineering team — from infrastructure provisioning through tooling, security baseline, and team handover.

1

Infrastructure Provisioning

  1. Choose hosted vs. self-managed infrastructure
    • Decide whether the test environment runs on a managed cloud (EKS, GKE, AKS, Vercel, Fly.io) or self-managed infra. Self-managed adds backup, patching, and capacity planning to the team's plate; managed shifts those to the provider but locks in pricing and IAM model.

    Collects list
  2. Provision the VPC and subnets via Terraform
    • Use the team's IaC module (Terraform / OpenTofu / Pulumi) — never click-ops the test VPC. Match the prod CIDR layout so peering and reachability tests behave the same way. Tag every resource with env=test and owner= so cost reports and orphan-resource sweeps work.

  3. Validate hardware capacity for the workload
    • For self-managed: confirm CPU, RAM, and disk IOPS meet the load-test target. For managed: confirm instance types and autoscaling group min/max are sized for the heaviest expected suite (e.g., k6 / Locust runs against the canary).

  4. Configure backup and snapshot schedule
    • Enable RDS automated snapshots and daily EBS / volume snapshots. Document the restore procedure in the runbook — a backup that has never been restored is not a backup. Schedule a quarterly restore drill.

2

Software and Dependencies

  1. Install the base OS image and apply patches
    • Use the org's hardened AMI / golden image where one exists. Pin to a specific image SHA so test runs are reproducible — drifting from latest bites in week three when a kernel update changes container behavior.

  2. Install middleware, databases, and runtime dependencies
    • Install PostgreSQL / Redis / message broker / language runtimes at the same major versions used in production. Version-mismatch defects (e.g., PG 14 vs 16 collation, Node 18 vs 20 fetch behavior) waste days when caught in test.

  3. Audit OSS license compliance for new dependencies
    • Run an SCA scan (Snyk, FOSSA, GitHub Advanced Security) and flag GPL/AGPL or SSPL packages before they ship. Generate an SBOM (CycloneDX or SPDX) for the test image — federal contracts under EO 14028 increasingly require it, and you don't want to retrofit later.

    Collects file
3

Source Control and CI/CD

  1. Connect the repo and configure branch protection
    • On the GitHub / GitLab repo, enable branch protection on main: required reviews via CODEOWNERS, required status checks, no force-push, no deletion. Without this, a flaky-CI culture takes hold and red merges become routine.

  2. Build the CI pipeline for the test environment
    • Wire GitHub Actions / GitLab CI / Buildkite to build, test, and push container images on every PR. Cache dependencies, parallelize the suite, and target a sub-15-minute total run — anything longer and engineers will start merging without waiting.

  3. Configure the deployment pipeline to test
    • Use ArgoCD / Spinnaker / Octopus or the team's existing CD tool. Deploy on every merge to main, surface the deployed SHA in a status endpoint, and post the result to #engineering so failures aren't silent.

  4. Seed the test database with anonymized fixtures
    • Never copy raw production data into test — that's a HIPAA / GDPR breach waiting to happen. Use a sanitized snapshot or generated fixtures (Faker, Factory Bot) that exercise the same edge cases.

4

Security Baseline

  1. Configure security groups and VPN access
    • Default-deny on inbound; only allow office VPN / Tailscale / Cloudflare Access CIDRs. Public exposure of a test environment is how one ends up in shodan.io with default credentials.

  2. Set up SSO and RBAC for environment access
    • Wire Okta / Google Workspace SSO via SAML or OIDC. Define k8s RBAC and AWS IAM roles that map to engineering groups, not individuals — SOC 2 access reviews are painful when permissions are granted per-person.

  3. Move secrets into the secrets manager
    • Use Vault, AWS Secrets Manager, or 1Password Connect. Enable pre-commit gitleaks or trufflehog so secrets never reach git history — once they do, rotating doesn't remove them; you need git-filter-repo or BFG.

  4. Run a baseline vulnerability scan
    • Run Snyk / Trivy / Dependabot against container images and dependencies. Triage criticals before opening the environment to the team — fixing them later means rebuilding fixtures.

    Collects list
5

Test Tooling

  1. Install unit and integration test runners
    • Pin Jest / Vitest / pytest / RSpec / JUnit versions in the lockfile. Confirm the runner reports JUnit-XML so CI surfaces failures inline rather than buried in logs.

  2. Wire up the e2e suite (Playwright or Cypress)
    • Run e2e against the deployed test environment, not localhost. Tag flaky specs and quarantine them with an owner and a deadline — flakes ignored long-term mask real regressions.

  3. Configure load-test tooling
    • Stand up k6 or Locust scripts that exercise the top 5 API endpoints by prod traffic. Establish baseline p50/p95/p99 numbers now so you can detect regressions rather than argue about whether something got slower.

6

Observability

  1. Connect logs, metrics, and traces
    • Pipe to Datadog / New Relic / Grafana+Loki+Tempo or the team's existing stack. Tag everything env=test so test traffic never pollutes production dashboards or burns the prod alert budget.

  2. Build dashboards for the golden signals
    • Latency, traffic, errors, saturation — one dashboard per service. Link each dashboard from the service catalog (Backstage) so on-call doesn't hunt during an incident.

  3. Wire alerts to PagerDuty or Opsgenie
    • Test-environment alerts go to a low-severity channel, not the prod page. Otherwise the team starts ignoring pages within a week and the signal-to-noise collapses for real incidents.

7

Handover and Documentation

  1. Write the environment runbook
    • Document in Confluence / Notion / Backstage: how to deploy, how to roll back, where logs live, how to refresh fixtures, and the on-call escalation path. The runbook is what makes the environment usable by someone who wasn't on the build team.

  2. Run a walkthrough for the engineering team
    • 30-minute Loom or live demo: deploying a PR, reading dashboards, triggering a load test, restoring from a snapshot. Record it so new hires get it on day one.

  3. Sign off the environment for team use
    Collects list Collects paragraph Collects signature