Infrastructure as Code Checklist
A weekly operations checklist for the IaC pipeline that provisions and maintains infrastructure — covers module hygiene, repository discipline, security scanning, state and secret handling, plan review, and post-apply verification. Run by the platform/sysadmin team before merg...
Module and Code Hygiene
-
Run terraform fmt and tflint on the changed modules
Format with terraform fmt -recursive and lint with tflint (or pulumi lint / az bicep lint depending on stack). CI should block on lint failures — local runs catch noise before the PR.
-
Validate provider and module version pins
Confirm required_providers and module sources are pinned to exact versions or narrow ranges. Floating to ~> 5.0 on the AWS provider has burned teams when minor releases change resource defaults.
-
Confirm README and variable docs are current
Regenerate docs with terraform-docs so input variables, outputs, and examples match the code. Stale README entries are the most common reason a downstream consumer pins the wrong inputs.
-
Verify no secrets are committed in plaintext
Run gitleaks or trufflehog against the diff. Secrets belong in Vault, AWS Secrets Manager, Azure Key Vault, or 1Password references — never inline in .tf, .tfvars, or pipeline YAML.
Repository and Branch Discipline
-
Open the change on a feature branch
Branch off main using the team's naming convention (e.g., iac/JIRA-1234-vpc-peering). Direct commits to main should be blocked by branch protection.
-
Write a descriptive commit message and PR body
Reference the ticket, summarize the blast radius, and paste the terraform plan output (or link to the CI job). Future incident responders read this first.
-
Request peer review from a second engineer
CODEOWNERS should enforce a platform-team approver on the module path. Self-merge of infrastructure changes is the single most common SOC 2 change-management finding.
-
Tag the release after merge
Cut a semver tag (e.g., v1.4.2) so downstream stacks can pin to a known-good version. Untagged modules referenced by ref=main drift silently and break reproducibility.
Security and Compliance Scanning
-
Run tfsec or Checkov against the changed code
Checkov, tfsec, and Trivy catch open S3 buckets, unrestricted security groups, and unencrypted RDS instances. Treat HIGH/CRITICAL findings as merge blockers unless explicitly waived.
Collects list -
Confirm encryption at rest and in transit is set
EBS volumes, RDS, S3, and managed disks must declare KMS / CMK encryption explicitly. Default-encryption account settings don't survive every resource type — make it explicit in the resource block.
-
Verify IAM follows least-privilege
No Action: "*" on Resource: "*". Scope IAM policies to the specific resource ARNs the workload needs. Standing administrator privilege is the #1 cloud-breach root cause.
-
File a documented exception with the security team
If a scanner finding cannot be remediated this release, open a ticket in the security backlog with the CVE / rule ID, business justification, compensating control, and expiry date. Auditors will ask.
State and Secrets Handling
-
Confirm remote state backend is locked
S3 + DynamoDB lock, Terraform Cloud workspace, or Azure Storage with blob lease — verify another engineer cannot apply concurrently. Two simultaneous applies corrupt state and require manual surgery to recover.
-
Rotate any exposed credentials before merge
Even if the commit is force-pushed away, treat anything that touched a public CI log or remote as compromised. Rewriting history does not un-leak a key.
Collects list -
Rotate the exposed secret and audit access logs
Rotate at the source of truth (IAM, Vault, Entra ID app registration), then pull CloudTrail / Entra audit logs for the window the secret was exposed. File an incident ticket regardless of whether use was observed.
Plan Review and Apply
-
Generate and attach the terraform plan output
Save the speculative plan from Atlantis, Terraform Cloud, or Spacelift. Reviewers gate on the plan, not the source diff — a one-line variable change can cascade into hundreds of resource replacements.
Collects file -
Review destroy and replace actions line by line
The - destroy and -/+ replace lines are where outages live. An RDS instance replacement deletes the database. A renamed resource in code reads as destroy + create to Terraform.
Collects list -
Confirm data-bearing replacements have a backup or snapshot
For RDS, EBS, managed disks, and stateful workloads: take a manual snapshot or final backup before apply. Verify the snapshot is restorable, not just that it was created.
-
Apply during the approved change window
Apply via the pipeline runner (not from a workstation) so the action is logged and uses the service principal's scoped credentials. Workstation applies bypass CAB and break the audit trail.
Post-Apply Verification
-
Verify monitoring and alerts fire on the new resources
Confirm Datadog, CloudWatch, or Azure Monitor is collecting metrics from the new resources and that alerting rules cover them. New resources without monitoring drift into production blind spots.
-
Confirm backup policy applies to new stateful resources
AWS Backup plan, Azure Backup vault, or Veeam job must pick up the new RDS / disk / database. Tag-based backup selection is easy to miss when a tag is typo'd.
-
Run a drift detection check
Run terraform plan again post-apply with no code changes. Output should be a no-op. Any drift means a resource was changed out of band — investigate before closing the change.
Collects list -
Close the change ticket with the apply log
Attach the full apply output to the ServiceNow / Jira ticket and mark the CR complete. The apply log is the SOX/SOC 2 evidence that the approved change matches what executed.
Collects list Collects paragraph Collects file
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Infrastructure as Code Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.