Infrastructure as Code (IaC) Checklist

Steps a platform or DevOps team runs on every Terraform / OpenTofu / Pulumi change — from branch through plan, security scan, review, and apply — to keep infra changes safe, reviewable, and reversible.

8 sections 25 steps Collects data
1

Version Control and Branching

  1. Pin the Terraform version in .terraform-version
    • Use tfenv (or asdf) and commit a .terraform-version file at the repo root so every contributor and CI runner uses the same minor version. State-file incompatibility between 1.5 and 1.6 is a common gotcha when one teammate upgrades locally.

  2. Open a feature branch off main
    • Branch name should reference the ticket — e.g. infra/PLAT-482-rds-encryption. Branch protection on main requires PR + passing checks + 1 CODEOWNERS approval; do not push directly.

  3. Write conventional-commit messages
    • Prefix with feat:, fix:, chore:, or refactor:. Breaking changes use feat!: with a BREAKING CHANGE: footer — release-please reads this to bump the module's semver tag.

2

Testing and Continuous Integration

  1. Run terraform plan against the staging workspace
    • Run terraform plan -out=tfplan against the staging workspace and attach the plan to the PR. Watch for unintended destroys (especially of data resources like RDS or S3) — a single ~ on a name attribute often hides a forced replacement.

    Collects file
  2. Execute Terratest unit and integration suites
    • Run the module's Go-based Terratest suite plus any terraform validate and terraform fmt -check gates. Integration tests that spin real AWS resources should target the sandbox account, not staging.

  3. Trigger the IaC pipeline on push
    • GitHub Actions / GitLab CI runs fmt, validate, plan, tfsec, and Terratest. Required status checks must all be green before merge — never merge with a flaky check waved through.

3

Security and Compliance Scanning

  1. Run tfsec and Checkov against the module
    • Both tools catch overlapping but distinct rules — tfsec is fast and Terraform-native, Checkov has broader CIS / SOC 2 / HIPAA policy packs. Configure them to fail the build on Critical and High by default; suppressions need an inline comment with the ticket justifying the exception.

  2. Record scan severity findings
    • Summarize the scanner output. Auditors collecting SOC 2 evidence look for the scan-result artifact attached to the PR; the ticket should also link to the SARIF upload in GitHub Advanced Security.

    Collects list
  3. Triage Critical and High findings before merge
    • Fix, accept-with-suppression (with security review), or escalate. Common Critical hits: S3 buckets without encryption, security groups open 0.0.0.0/0 on 22, IAM policies with * on resources. Document any accepted risk in the security register.

4

Documentation and Knowledge Sharing

  1. Update the module README with input and output changes
    • Document new variables, default values, and outputs. If a variable changed type or default, call it out as a breaking change in the changelog — module consumers will hit it on their next terraform init -upgrade.

  2. Regenerate the terraform-docs reference
    • Run terraform-docs markdown table --output-file README.md . (or via pre-commit). The CI pipeline fails if the generated section drifts from committed; that gate keeps the docs honest.

  3. Post the change summary in #infra-changes
    • One-paragraph summary: what changed, blast radius (which environments, which services), rollback approach, and a link to the PR. Application-team leads watch this channel to know when shared infra moved underneath them.

5

Configuration Management

  1. Confirm resources are declarative and idempotent
    • Run terraform plan twice in a row against the same workspace — the second run should show No changes. Drift between runs usually means a local-exec provisioner or a data source returning non-deterministic values; refactor those out.

  2. Move new secrets into Vault or AWS Secrets Manager
    • Never put secret values in .tfvars or commit them to git — even rotating later doesn't remove the value from history. Reference secrets via vault_generic_secret or aws_secretsmanager_secret_version data sources, and confirm gitleaks runs in pre-commit.

  3. Parameterize environment differences via tfvars
    • One module, multiple env/*.tfvars files (dev, staging, prod). Do not fork the module per environment — divergence is the #1 source of "works in staging, breaks in prod" infra incidents.

6

Monitoring and Performance

  1. Add Datadog or CloudWatch monitors for new resources
    • Define monitors as code in the same module — datadog_monitor or aws_cloudwatch_metric_alarm. Cover the four golden signals (latency, traffic, errors, saturation) at minimum; orphan resources without alerts is how outages slip past on-call.

  2. Wire alerts to the existing PagerDuty service
    • Critical alerts go to a PagerDuty service that maps to the on-call schedule for the owning team. Warning-level alerts should route to Slack, not PagerDuty — unactionable pages erode response discipline within weeks.

  3. Verify dashboards reflect the new resource set
    • Open the service dashboard and confirm new resources appear in the resource-list widgets and SLO panels. Also check that the auto-generated AWS service quota dashboard hasn't gone red (e.g. EIPs per region, RDS instances per account).

7

Dependency Management

  1. Pin provider and module versions in versions.tf
    • Use ~> 5.40 style pessimistic constraints, not >= 5.0. Commit the .terraform.lock.hcl — without it, terraform init on different machines pulls different provider versions and CI plans diverge from local plans.

  2. Review open Renovate or Dependabot upgrade PRs
    • Don't let upgrade PRs pile to 80+. Auto-merge passing patch and minor for vetted providers (aws, hashicorp/random, hashicorp/null); major bumps need a human reading the upstream changelog because they often shift resource schemas.

  3. Test pinned upgrades in the sandbox workspace
    • Run terraform plan after the upgrade and look for unexpected diffs — provider major versions sometimes rename or default attributes. Roll the upgrade through sandbox → staging → prod, never straight to prod.

8

Code Review and Apply

  1. Request review from the CODEOWNERS infra team
    • The CODEOWNERS file routes review to the team that owns the module. PR description should include: blast radius, plan output link, scan results, rollback steps. "LGTM" on a 1,200-line plan is a red flag — break large changes into reviewable PRs under ~400 lines.

    Collects url
  2. Classify the change risk level
    • Low = additive, non-prod, easily reversible (new tag, new monitor). Medium = prod-touching but reversible (new resource, parameter change). High = stateful resource changes, IAM scope expansions, breaking module changes, or anything touching shared networking. High requires a second approver and a deploy window outside Friday afternoon.

    Collects list
  3. Obtain a second approver for High-risk changes
    • Second approver should be a staff engineer or platform lead outside the original author's immediate sub-team. SOC 2 segregation-of-duties evidence pulls directly from this approval; auditors will sample PRs and check for two distinct reviewers.

  4. Apply the plan and tag the release
    • Apply via the CI runner (Atlantis, Terraform Cloud, or a protected GitHub Actions workflow) — never terraform apply from a laptop against prod. Tag the merged commit (e.g. v2024.45.0), push the changelog, and confirm the post-apply plan shows No changes.

    Collects list Collects text Collects signature

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.


Sections 8
Steps 25
Category Software Development
Price Free to start
Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Run Infrastructure as Code (IaC) Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.