Change Management Checklist

Request Intake and Scoping

    Use the CHG project in Jira (or your equivalent in Linear / ServiceNow). Link the originating PR, incident, or feature epic. Include the target environment, customer-visible impact, and the engineer accountable for the change.

    Standard = pre-approved low-risk (dependency patch, doc update). Normal = scheduled change requiring CAB review. Emergency = breaks-glass for SEV1 or expiring TLS cert. Misclassifying an emergency as normal is a common SOC 2 audit finding.

    Pull the service from Backstage or your service catalog. Note upstream callers, downstream dependencies, shared databases, and any feature flags this change depends on. Database schema changes always require a separate migration plan.

    Post in #eng-releases and tag the on-call for each dependent service. Don't rely on CODEOWNERS auto-review alone — async Slack notice gives owners time to flag conflicts before CAB.

Change Plan and Rollback

    Write the runbook as commands a different engineer could execute. Include Terraform plan output, kubectl commands, migration scripts, and the order they run in. Database migration always deploys before the backend that depends on it.

    Confirm the previous container image is still in the registry and not pruned. For DB migrations, write a reverse migration or document the forward-compatible path. "Restore from backup" is not a rollback plan — that's a disaster recovery plan.

    Name the Datadog or Grafana dashboards to watch post-deploy. Document the p99 latency and error-rate thresholds that constitute a rollback trigger. If this change consumes error budget, note it in the SLO ticket.

CAB Review and Approval

    Attach the runbook, rollback plan, blast-radius summary, and risk classification. CAB meets weekly; submitting after the cutoff pushes the change a full week.

    Cover scope, blast radius, rollback, and the deploy window. Be explicit about which dependent services have signed off. CAB members will challenge the rollback plan — have it tested or expect pushback.

    Record the outcome in the CR ticket. Approved-with-conditions means the conditions are blocking — don't deploy until they're resolved and re-confirmed in writing. SOC 2 auditors trace approvals back to this artifact.

    Common conditions: load test in staging, security review for IAM changes, cross-team sign-off, expanded canary window. Re-attach evidence to the CR before re-requesting approval.

Deploy Window Execution

    Check PagerDuty and the #incidents channel. An active incident on a dependent service means the deploy waits — even a green build doesn't justify shipping into a degraded environment.

    Set branch protection to require release-captain approval, or post the freeze notice in #eng-releases. Concurrent merges during a canary muddy the rollback decision.

    Watch replication lag in RDS during the migration. For Postgres, use CONCURRENTLY for index creation; for column adds with defaults on large tables, batch the backfill rather than locking the table. Skip this step if the change has no schema change.

    Hold canary for at least 10 minutes. Watch error rate, p99 latency, and saturation on the dashboard named in the success-metrics step. If any threshold is breached, abort and roll back before continuing.

    Frontend ships last because the backend is forward-compatible. Reversing the order means the frontend calls API endpoints the backend doesn't yet serve. CDN cache purge is part of this step for Cloudflare or CloudFront-fronted assets.

Rollback Path

    Trigger a SEV2 in PagerDuty, open the war-room Zoom, and post in #incidents. The release captain is not the IC — split the roles so one person drives the rollback while the other coordinates comms.

    Follow the runbook from the Change Plan section. Redeploy the previous container image tag; if the migration was non-reversible, run the documented forward-fix instead. Don't improvise — improvised rollbacks are how outages double in length.

    Watch the same dashboards used during the deploy. Confirm error rate and p99 latency return to the pre-deploy baseline. Update the status page to resolved only after 30 minutes of clean signal.

Post-Deploy Verification

    Trigger the Playwright or Cypress synthetic against production. Cover the critical user journeys — login, primary CRUD path, billing webhook. A green CI build does not substitute for production smoke.

    New unique error fingerprints in the first hour are the tell — even at low volume they often grow. Triage each new signature; assign a ticket or roll back depending on severity.

    If the change ships dark behind a LaunchDarkly or Unleash flag, schedule the rollout per the launch plan. Note the flag's owner and cleanup ticket — stale flags accumulate fast.

Closure and SOC 2 Evidence

    Use semver — e.g., v2024.45.0 — and push the annotated tag with the deployed sha. The tag is the artifact auditors map back to the CAB approval.

    Link the CR ticket, CAB approval, deploy log, and post-deploy verification screenshots in Vanta or Drata. SOC 2 Type II auditors sample CRs across the audit window — missing evidence on one sampled change becomes a control exception.

    Rolled-back or partial deploys get a blameless PIR within 5 business days. Capture contributing factors, not just the surface cause — alert tuning gaps, missing runbook steps, and review-process misses are the durable lessons.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Software Development Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack