Release Management Checklist

Change-management workflow IT operations runs to take a release from RFC through CAB approval, deployment window, and post-release closure. Built around normal/standard/emergency change classification with rollback discipline.

1

Change Planning and RFC

  1. Document the change scope and business driver
    • Capture the RFC summary in ServiceNow / Jira Service Management / ConnectWise: what is changing, the business reason, and which application or infrastructure components are touched. Vague RFCs ("upgrade middleware") get bounced at CAB — name the version, the host, and the customer-visible behavior.

  2. Map impacted systems and downstream dependencies
    • Pull the CMDB record and trace upstream / downstream dependencies — load balancers, scheduled jobs, integrations, monitoring agents. Common gotcha: a "single app" upgrade silently breaks the nightly batch job that pulls from its API.

  3. Attach the rollback runbook
    • Document the exact rollback procedure: snapshot revert commands, package downgrade syntax, config restore steps, and the named decision point at which the engineer aborts and rolls back. "We can roll back" is not a plan — the runbook is the plan.

    Collects file
  4. Classify the change type
    • Standard = pre-approved, repeatable, low-risk (e.g., routine WSUS patch ring). Normal = requires CAB review. Emergency = bypasses normal CAB cadence with expedited approval. Misclassifying a normal change as standard is the most common audit finding in SOX ITGC reviews.

    Collects list
  5. Schedule the maintenance window
    • Check the change calendar for blackout periods (month-end close, retail freeze, fiscal cutover). Pick a window that gives the on-call team daylight to roll back if smoke tests fail — Friday 5pm is the runbook-author's enemy.

    Collects datetime
2

Pre-Deployment Validation

  1. Deploy the build to the test ring
    • Use the staging or pilot OU / VLAN / cluster that mirrors production topology. Test rings that diverge from prod (different OS build, different agents installed) provide false confidence — patch Tuesday horror stories almost always trace to a staging environment that wasn't really staging.

  2. Run the smoke test suite on staging
    • Hit the named user-facing checks: SSO login, primary report runs, scheduled job fires, monitoring agent reports in. The smoke-test list lives in the runbook — if it's not written down, it's not a test, it's vibes.

  3. Verify the most recent backup is restorable
    • Confirm Veeam / Datto / Rubrik shows a successful job within the past 24 hours and that the restore point is mountable — "green dashboard" is not the same as "restorable." If the change touches a database, take a fresh backup before the window starts; do not rely on last night's job.

    Collects list
  4. Dry-run the rollback procedure on staging
    • Walk the rollback runbook step by step on the staging system after the change is applied there. The first time you discover the rollback depends on a credential that rotated should not be at 2am on production.

  5. Log defects and known issues for the change record
3

CAB Review and Communication

  1. Submit the RFC to the CAB queue
    • Most CABs require RFCs in the queue 48-72 hours before the meeting. Late-submitted RFCs get deferred to the next cycle by default, which usually means slipping the deployment window by a week.

  2. Present the change at CAB
    • Walk the board through scope, blast radius, rollback plan, and test evidence. Be ready for the question "who owns the rollback decision during the window?" — name the engineer and their backup.

    Collects list
  3. Send maintenance notice to affected users
    • Notify via the standing channel — status page, Teams / Slack #announcements, email to affected DLs. Include start time, expected duration, services impacted, and where to file tickets if something is broken after the window closes.

  4. Confirm on-call coverage for the window
    • Page the primary, secondary, and the named escalation owner via PagerDuty / Opsgenie before the window opens. For MSP work, confirm the client's after-hours contact is reachable in case decisions need their sign-off mid-window.

4

Deployment Execution

  1. Snapshot production hosts before cutover
    • Take VM-level snapshots in vCenter / Hyper-V / Proxmox immediately before the deployment runbook starts. Note: snapshots are not backups — they expire on day-counters and they balloon storage. Tag the snapshot with the change number and a 7-day expiry.

  2. Execute the deployment runbook
    • Follow the runbook exactly — do not improvise during the window. If a step fails or surfaces unexpected behavior, stop and call the change owner before continuing. Off-script execution is the most common cause of post-mortem findings.

  3. Run post-deploy smoke tests against production
    • Execute the same smoke-test list used on staging. Verify monitoring (PRTG, Datadog, LogicMonitor) shows agents reporting and no new alerts. Customer-facing endpoints should be checked from outside the corporate network.

    Collects list
  4. Execute the rollback procedure
    • Follow the rollback runbook attached to the RFC. Restore from snapshot or run the documented downgrade steps, re-run smoke tests, and re-open the change as failed. Notify users on the same channel where the maintenance notice was posted.

5

Post-Release Closure

  1. Monitor systems through the stabilization period
    • Watch SIEM, EDR, and APM dashboards for 48-72 hours post-cutover. Track any new tickets tagged to the change in the PSA / ITSM queue. Spike in helpdesk volume on day 2 is often the first signal of a partial regression.

  2. Hold the post-implementation review
    • Walk through what went per plan, what deviated, and which runbook steps need correction. Capture action items with owners and due dates — a PIR with no follow-through is theater.

  3. Update the CMDB and runbook documentation
    • Push CI updates into ServiceNow / IT Glue / Hudu — version, host, dependencies, owner. Stale CMDB entries are how the next change owner trips on the same dependency you just discovered.

  4. Close the change record with outcome and artifacts
    • Set final status, attach evidence (smoke test output, monitoring screenshots, communication artifacts), and link the PIR notes. SOX, SOC 2, and HIPAA audits sample closed change records — a half-closed ticket is an audit finding.

    Collects list Collects paragraph Collects file