Rollback Plan Checklist

Pre-Rollback Preparation

    Pin the exact build, KB number, firmware version, or configuration baseline you intend to restore. Reference the change ticket (RFC) and the CAB-approved rollback plan. For OS patches, capture the KB IDs being uninstalled; for firmware, the prior version string from iLO/iDRAC; for app deployments, the package version in SCCM/Intune.

    Confirm the most recent Veeam / Datto / Rubrik job completed without warnings, and that VM-level snapshots were taken before the original change. A green dashboard is not enough — open the job log and check for skipped objects. If the rollback may need a restore, validate the restore path before you start, not during the incident.

    Send the rollback notification through the change comms channel — affected business owners, helpdesk lead, NOC, and (for MSPs) the client primary contact. Include start time, expected duration, services impacted, and the bridge / PagerDuty incident link. Keep the tone factual; this is operational, not catastrophic.

    Record what failed and what was observed — error messages, monitoring alerts, user reports, ticket numbers. This becomes the post-change review record and feeds the next CAB submission. A rollback without a documented trigger looks like operator panic on audit.

    Name the change executor, the verifier, and the comms lead before kicking off. For Tier 0 systems (DCs, hypervisors, core firewalls) the executor and verifier should be different people. Open the bridge in Teams/Zoom and pin the runbook in the channel.

Execution of Rollback

    Check out the required Tier 0 / Tier 1 credentials from CyberArk, BeyondTrust, or Hudu Vault using just-in-time elevation. Verify the break-glass account is sealed and known. Do not run the rollback from a daily-driver workstation — use a Privileged Access Workstation (PAW).

    Suppress alerts in PagerDuty / Opsgenie and place affected hosts in maintenance mode in PRTG / SolarWinds / Datadog. Disable scheduled tasks, GPO refresh on the pilot OU, and any RMM scripts that might re-deploy the failed change. Note the exact time you suppressed alerts so you can reverse it.

    Follow the rollback runbook exactly as approved by CAB — uninstall KB, revert GPO link, restore VM from snapshot, push prior package via SCCM/Intune, or re-flash firmware. Do not improvise; deviations are the most common cause of a rollback that itself fails. If you must deviate, stop and escalate before continuing.

    Record whether the rollback completed cleanly. If it failed mid-flight, the next phase is escalation to vendor support and potential restore-from-backup, not a retry of the same script.

    Open a Sev-1 case with the vendor (Microsoft, VMware, Cisco, Veeam, etc.) and start the bare-metal or VM-level restore from the verified backup. Engage the on-call engineering manager. Keep the bridge open and update stakeholders every 30 minutes until service is restored.

Post-Rollback Verification

    Walk the documented smoke tests for the affected service — auth via SSO, mail flow via message trace, file-share access, VPN tunnel up, line-of-business app login. For DCs, confirm replication with repadmin /replsummary. Capture screenshots or log excerpts for the change record.

    For database rollbacks, run the application's integrity check (DBCC CHECKDB, vendor-provided consistency tool) and compare row counts against the pre-change baseline. For file shares, confirm ACLs survived the snapshot revert. Data drift after a rollback is a silent killer — catch it now, not in next month's reconciliation.

    Take hosts out of maintenance mode in PRTG / SolarWinds / Datadog, lift PagerDuty suppression, and re-enable the scheduled tasks and RMM scripts paused earlier. Watch the dashboard for 15 minutes for false alarms before standing down the bridge.

    Send the all-clear to the same distribution as the kickoff notice. Include actual end time, services restored, any residual issues, and a pointer to the post-change review meeting time. For MSP clients, log the incident summary in the PSA (ConnectWise, Autotask, Halo) against the affected tickets.

    Update the RFC in ServiceNow / Jira Service Management / Freshservice with rollback outcome, attach evidence, and close. Schedule the post-incident review (PIR) within 5 business days while details are fresh. Capture the root cause and the corrective action that will go into the next CAB submission for the original change.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Systems Administration Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack