Rollback Plan Checklist

Pre-Rollback Preparation

    Pin the exact build, KB number, firmware version, or configuration baseline you intend to restore. Reference the change ticket (RFC) and the CAB-approved rollback plan. For OS patches, capture the KB IDs being uninstalled; for firmware, the prior version string from iLO/iDRAC; for app deployments, the package version in SCCM/Intune.

    Confirm the most recent Veeam / Datto / Rubrik job completed without warnings, and that VM-level snapshots were taken before the original change. A green dashboard is not enough — open the job log and check for skipped objects. If the rollback may need a restore, validate the restore path before you start, not during the incident.

    Send the rollback notification through the change comms channel — affected business owners, helpdesk lead, NOC, and (for MSPs) the client primary contact. Include start time, expected duration, services impacted, and the bridge / PagerDuty incident link. Keep the tone factual; this is operational, not catastrophic.

    Record what failed and what was observed — error messages, monitoring alerts, user reports, ticket numbers. This becomes the post-change review record and feeds the next CAB submission. A rollback without a documented trigger looks like operator panic on audit.

    Name the change executor, the verifier, and the comms lead before kicking off. For Tier 0 systems (DCs, hypervisors, core firewalls) the executor and verifier should be different people. Open the bridge in Teams/Zoom and pin the runbook in the channel.

Execution of Rollback

    Check out the required Tier 0 / Tier 1 credentials from CyberArk, BeyondTrust, or Hudu Vault using just-in-time elevation. Verify the break-glass account is sealed and known. Do not run the rollback from a daily-driver workstation — use a Privileged Access Workstation (PAW).

    Suppress alerts in PagerDuty / Opsgenie and place affected hosts in maintenance mode in PRTG / SolarWinds / Datadog. Disable scheduled tasks, GPO refresh on the pilot OU, and any RMM scripts that might re-deploy the failed change. Note the exact time you suppressed alerts so you can reverse it.

    Follow the rollback runbook exactly as approved by CAB — uninstall KB, revert GPO link, restore VM from snapshot, push prior package via SCCM/Intune, or re-flash firmware. Do not improvise; deviations are the most common cause of a rollback that itself fails. If you must deviate, stop and escalate before continuing.

    Record whether the rollback completed cleanly. If it failed mid-flight, the next phase is escalation to vendor support and potential restore-from-backup, not a retry of the same script.

    Open a Sev-1 case with the vendor (Microsoft, VMware, Cisco, Veeam, etc.) and start the bare-metal or VM-level restore from the verified backup. Engage the on-call engineering manager. Keep the bridge open and update stakeholders every 30 minutes until service is restored.

Post-Rollback Verification

    Walk the documented smoke tests for the affected service — auth via SSO, mail flow via message trace, file-share access, VPN tunnel up, line-of-business app login. For DCs, confirm replication with repadmin /replsummary. Capture screenshots or log excerpts for the change record.

    For database rollbacks, run the application's integrity check (DBCC CHECKDB, vendor-provided consistency tool) and compare row counts against the pre-change baseline. For file shares, confirm ACLs survived the snapshot revert. Data drift after a rollback is a silent killer — catch it now, not in next month's reconciliation.

    Take hosts out of maintenance mode in PRTG / SolarWinds / Datadog, lift PagerDuty suppression, and re-enable the scheduled tasks and RMM scripts paused earlier. Watch the dashboard for 15 minutes for false alarms before standing down the bridge.

    Send the all-clear to the same distribution as the kickoff notice. Include actual end time, services restored, any residual issues, and a pointer to the post-change review meeting time. For MSP clients, log the incident summary in the PSA (ConnectWise, Autotask, Halo) against the affected tickets.

    Update the RFC in ServiceNow / Jira Service Management / Freshservice with rollback outcome, attach evidence, and close. Schedule the post-incident review (PIR) within 5 business days while details are fresh. Capture the root cause and the corrective action that will go into the next CAB submission for the original change.