Testing Environment Setup Checklist
Steps a platform or QA lead runs to stand up a new testing environment for a SaaS engineering team — from infrastructure provisioning through tooling, security baseline, and team handover.
Infrastructure Provisioning
-
Choose hosted vs. self-managed infrastructure
Decide whether the test environment runs on a managed cloud (EKS, GKE, AKS, Vercel, Fly.io) or self-managed infra. Self-managed adds backup, patching, and capacity planning to the team's plate; managed shifts those to the provider but locks in pricing and IAM model.
Collects list -
Provision the VPC and subnets via Terraform
Use the team's IaC module (Terraform / OpenTofu / Pulumi) — never click-ops the test VPC. Match the prod CIDR layout so peering and reachability tests behave the same way. Tag every resource with env=test and owner= so cost reports and orphan-resource sweeps work.
-
Validate hardware capacity for the workload
For self-managed: confirm CPU, RAM, and disk IOPS meet the load-test target. For managed: confirm instance types and autoscaling group min/max are sized for the heaviest expected suite (e.g., k6 / Locust runs against the canary).
-
Configure backup and snapshot schedule
Enable RDS automated snapshots and daily EBS / volume snapshots. Document the restore procedure in the runbook — a backup that has never been restored is not a backup. Schedule a quarterly restore drill.
Software and Dependencies
-
Install the base OS image and apply patches
Use the org's hardened AMI / golden image where one exists. Pin to a specific image SHA so test runs are reproducible — drifting from latest bites in week three when a kernel update changes container behavior.
-
Install middleware, databases, and runtime dependencies
Install PostgreSQL / Redis / message broker / language runtimes at the same major versions used in production. Version-mismatch defects (e.g., PG 14 vs 16 collation, Node 18 vs 20 fetch behavior) waste days when caught in test.
-
Audit OSS license compliance for new dependencies
Run an SCA scan (Snyk, FOSSA, GitHub Advanced Security) and flag GPL/AGPL or SSPL packages before they ship. Generate an SBOM (CycloneDX or SPDX) for the test image — federal contracts under EO 14028 increasingly require it, and you don't want to retrofit later.
Collects file
Source Control and CI/CD
-
Connect the repo and configure branch protection
On the GitHub / GitLab repo, enable branch protection on main: required reviews via CODEOWNERS, required status checks, no force-push, no deletion. Without this, a flaky-CI culture takes hold and red merges become routine.
-
Build the CI pipeline for the test environment
Wire GitHub Actions / GitLab CI / Buildkite to build, test, and push container images on every PR. Cache dependencies, parallelize the suite, and target a sub-15-minute total run — anything longer and engineers will start merging without waiting.
-
Configure the deployment pipeline to test
Use ArgoCD / Spinnaker / Octopus or the team's existing CD tool. Deploy on every merge to main, surface the deployed SHA in a status endpoint, and post the result to #engineering so failures aren't silent.
-
Seed the test database with anonymized fixtures
Never copy raw production data into test — that's a HIPAA / GDPR breach waiting to happen. Use a sanitized snapshot or generated fixtures (Faker, Factory Bot) that exercise the same edge cases.
Security Baseline
-
Configure security groups and VPN access
Default-deny on inbound; only allow office VPN / Tailscale / Cloudflare Access CIDRs. Public exposure of a test environment is how one ends up in shodan.io with default credentials.
-
Set up SSO and RBAC for environment access
Wire Okta / Google Workspace SSO via SAML or OIDC. Define k8s RBAC and AWS IAM roles that map to engineering groups, not individuals — SOC 2 access reviews are painful when permissions are granted per-person.
-
Move secrets into the secrets manager
Use Vault, AWS Secrets Manager, or 1Password Connect. Enable pre-commit gitleaks or trufflehog so secrets never reach git history — once they do, rotating doesn't remove them; you need git-filter-repo or BFG.
-
Run a baseline vulnerability scan
Run Snyk / Trivy / Dependabot against container images and dependencies. Triage criticals before opening the environment to the team — fixing them later means rebuilding fixtures.
Collects list
Test Tooling
-
Install unit and integration test runners
Pin Jest / Vitest / pytest / RSpec / JUnit versions in the lockfile. Confirm the runner reports JUnit-XML so CI surfaces failures inline rather than buried in logs.
-
Wire up the e2e suite (Playwright or Cypress)
Run e2e against the deployed test environment, not localhost. Tag flaky specs and quarantine them with an owner and a deadline — flakes ignored long-term mask real regressions.
-
Configure load-test tooling
Stand up k6 or Locust scripts that exercise the top 5 API endpoints by prod traffic. Establish baseline p50/p95/p99 numbers now so you can detect regressions rather than argue about whether something got slower.
Observability
-
Connect logs, metrics, and traces
Pipe to Datadog / New Relic / Grafana+Loki+Tempo or the team's existing stack. Tag everything env=test so test traffic never pollutes production dashboards or burns the prod alert budget.
-
Build dashboards for the golden signals
Latency, traffic, errors, saturation — one dashboard per service. Link each dashboard from the service catalog (Backstage) so on-call doesn't hunt during an incident.
-
Wire alerts to PagerDuty or Opsgenie
Test-environment alerts go to a low-severity channel, not the prod page. Otherwise the team starts ignoring pages within a week and the signal-to-noise collapses for real incidents.
Handover and Documentation
-
Write the environment runbook
Document in Confluence / Notion / Backstage: how to deploy, how to roll back, where logs live, how to refresh fixtures, and the on-call escalation path. The runbook is what makes the environment usable by someone who wasn't on the build team.
-
Run a walkthrough for the engineering team
30-minute Loom or live demo: deploying a PR, reading dashboards, triggering a load test, restoring from a snapshot. Record it so new hires get it on day one.
-
Sign off the environment for team useCollects list Collects paragraph Collects signature