Data Recovery Checklist
Backup Job Verification
Reconcile the Veeam / Datto / Rubrik job list against the CMDB or RMM asset list. Flag any production VM, file share, M365 tenant, or SQL instance not covered. Newly-provisioned hosts that never got tagged into a backup job are the most common gap.
Open the backup console and filter for warnings and failures. A streak of "success with warnings" usually means VSS snapshot timeouts or skipped open files — investigate before assuming the job is healthy.
File a ticket in ConnectWise PSA / Autotask / ServiceNow against the affected job, link the log excerpt, and assign to the backup engineer. Do not proceed with the restore drill until known job failures are triaged — restoring from a broken chain wastes the drill window.
Confirm 3-2-1 is intact: production copy, secondary local copy, and an offsite copy with object lock or air-gap (S3 Object Lock, Wasabi immutability, LTO tape, separate cloud account). The offsite copy is the only thing standing between the org and a ransomware-encrypted primary backup.
Confirm the BitLocker recovery keys, backup repository encryption passphrase, and KMS keys are present in the password vault (Keeper, Bitwarden, IT Glue, Hudu) and accessible to at least two named operators. Backups encrypted with a key nobody can find are the same as no backups.
Restore Plan and Scope
Rotate scenarios across drills so coverage is broad: file-level restore, full VM restore, SQL point-in-time, M365 mailbox, Entra ID object recovery, or full site failover. Pick one scenario and stick to it — combining scenarios in one drill muddies the timing data.
Pick a recovery point that exercises the chain — typically a synthetic full plus several incrementals, not the most recent point alone. Document the timestamp; this is what RPO is measured against.
Restore into an isolated VLAN or sandbox vCenter cluster — never into production. Block egress to production AD / DNS so a restored host can't register, reboot a duplicate SPN, or pull production GPOs. Veeam SureBackup, Datto Virtualization, and Rubrik Live Mount all support isolated networks.
List upstream dependencies — DNS, AD, certificate authority, license server, SQL backend — and either spin up isolated copies or stub them. App servers booting without a reachable DC will hang at login and skew the RTO measurement.
Execute the Restore
Note the wall-clock start time. RTO measurement starts here, not when the job was queued. Watch for early failures — repository connection, credential prompt, dedupe rehydrate stalls — and resolve in-line.
Track throughput in the backup console. If the projected completion time exceeds the documented RTO, escalate now — do not wait for the post-restore review. Common culprits: cold cloud-tier rehydrate, network bottleneck between repo and target, undersized restore proxy.
Post-Restoration Validation
Check the Windows Service Control Manager or systemd for failed services. SQL Server, IIS app pools, and scheduled tasks are the usual offenders — they often depend on a service account whose password has rotated since the recovery point.
Don't stop at "the VM booted." Log in as a test user, open the application, run a transaction, and validate against the application owner's pass criteria. For SQL: run DBCC CHECKDB. For file shares: spot-check files against a checksum manifest.
If the smoke test failed, file a P2 in the PSA / ITSM tool with the console screenshot, the validation notes, and the affected backup job. Assign to the backup engineering lead and tag for the next change advisory board so the fix is tracked, not lost in a Slack thread.
Restored file shares often come back with broken inheritance or SIDs that no longer resolve. Run icacls or Get-Acl against a sample of folders, and confirm AD security group membership matches the production source-of-truth.
Power off and delete the sandbox VMs, release the isolated VLAN, and unmount any Live Mount / Instant Recovery sessions. Forgotten sandbox VMs accumulate license cost and clutter the inventory before the next drill.
Drill Reporting and Sign-Off
Edit the recovery runbook in IT Glue / Hudu / Confluence with anything that surprised the operator — undocumented dependency, missing credential, unexpected duration. The next drill (or the next real incident) is run from this document.
Capture the IT manager's or vCIO's sign-off. SOC 2, HIPAA, and PCI auditors will ask for evidence that restore drills happen on a defined cadence with named approvers — the signature plus the captured RTO/RPO data is that evidence.
Use this template in Manifestly
- Cloud Migration Checklist
- Cloud Security Checklist
- User Access Review Checklist
- Containerization Rollout Checklist
- Database Backup Checklist
- Password Management Checklist
- Backup and Restore Checklist
- Network Upgrade Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Problem Management Checklist
- Server Decommissioning Checklist
- Cloud Monitoring Checklist
- Hardware Inventory Checklist
- IT Regulatory Compliance Review
- Release Management Checklist
- Server Maintenance Checklist
- Rollback Plan Checklist
- Customer Support Ticket Workflow
- Software Upgrade Checklist
- Quarterly Compliance Reporting Checklist
- Patch Management Checklist
- Hardware Maintenance Checklist
- Server Security Checklist
- IT Emergency Response Checklist
- Incident Management Checklist
- Disaster Recovery Plan Checklist
- User Role Management Checklist
- Software Installation Checklist
- Compliance Audit Checklist
- Access Control Checklist
- Cloud Cost Management Checklist
- IT Staff Performance Review
- Change Management Checklist
- Firewall Configuration Checklist
- Security Audit Checklist
- Quarterly Network Security Review
- Database Migration Checklist
- Employee Onboarding Checklist
- Capacity Planning Checklist
- IT Budgeting Checklist
- Network Monitoring Checklist
- Cloud Deployment Checklist
- Database Installation Checklist
- IT Service Request Checklist
- Database Security Checklist
- System Monitoring Checklist
- Hardware Troubleshooting Checklist
- IT Strategy Checklist
- Patch Deployment Checklist
- Hardware Upgrade Checklist
- Performance Tuning Checklist
- Application Performance Monitoring Checklist
- Employee Training Checklist
- User Onboarding Checklist
- IT Vendor Management Checklist
- Server Build and Hardening Checklist
- IT Policy Review Checklist
- Help Desk Ticket Handling Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Network Troubleshooting Checklist
- User Offboarding Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- Data Backup Verification Checklist
- Disaster Recovery Plan Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Business Continuity Checklist
- Database Backup Checklist
- Backup and Restore Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- E-commerce Backup and Recovery Checklist
- Backup and Recovery Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
