Backup and Restore Checklist
Pre-Backup Preparation
Pull the current RPO/RTO targets from the BCP document or MSA. Note any client tier or regulatory drivers — HIPAA, SOC 2, PCI DSS — that dictate retention or immutability requirements. Mismatched targets between the BCP and the actual backup job schedule are the most common audit finding.
Cross-check the Veeam / Datto / Rubrik job list against the CMDB or RMM asset list. Flag any production VM, file share, SQL instance, or M365 tenant not covered by a job. New workloads added since the last cycle are the typical source of unprotected data.
Confirm at least 20% headroom on the primary repository and the offsite / immutable copy. Review dedupe and compression ratios for drift; sudden ratio drops usually mean a new workload is writing incompressible data (encrypted volumes, media files) and will blow the capacity plan.
Confirm 3-2-1 posture: 3 copies, 2 media types, 1 offsite, with at least one immutable or air-gapped copy (S3 Object Lock, Veeam hardened repo, LTO tape). Backup writable from production is the single most common reason ransomware encrypts the backups along with everything else.
For application-consistent backups requiring brief service quiesce (SQL, Exchange, file servers with VSS), send the change notice through PSA / ITSM 48 hours ahead. Include start time, expected duration, and rollback contact.
Backup Execution
For scheduled jobs, confirm the run kicked off at the configured time in Veeam B&R / Datto / Commvault. For ad-hoc runs, document the trigger reason in the PSA ticket. Verify VSS writers are healthy on Windows targets before the snapshot phase.
Watch the job dashboard for warnings: VSS quiesce failures, network throughput drops, target unreachable, credential errors. Most overnight job failures trace back to a service account whose password rotated without the backup vendor being updated.
Native Microsoft and Google retention is not a backup. Confirm the third-party SaaS backup (Datto SaaS Protection, Veeam for M365, AvePoint, Spanning) ran for Exchange Online mailboxes, OneDrive, SharePoint, and Teams chat. New users added since last run are typically not auto-licensed for protection.
Verify the secondary copy job to the cloud / offsite repo finished within the WAN window. For Datto SIRIS / Veeam Cloud Connect / AWS S3 with Object Lock, confirm the immutable retention flag is set on the new restore points.
Failure Triage
Create the incident in ConnectWise / Autotask / ServiceNow with the failed VM list and error codes. Tag the affected client and assign per the on-call schedule. SLA clock starts at job-failure detection, not at ticket creation.
Common fixes: rotate the cached service account password in the backup proxy, clear stale VSS shadow copies, expand a tight repository, reseat a hung backup agent. Rerun and confirm the restore point lands before the next scheduled cycle.
Restore Verification Drill
Rotate test scope each cycle: a file-level restore one month, a full VM Instant Recovery the next, a SQL point-in-time restore the next. Backup green for 18 months and first restore fails is the canonical disaster scenario; rotation is the discipline that prevents it.
Mount the restore in a fenced VLAN or Veeam SureBackup virtual lab — never into production. Restoring a domain controller into the live domain has caused multiple all-hands outages from USN rollback.
Boot the restored VM, log in, run application smoke tests (SQL DBCC CHECKDB, Exchange mailbox open, file checksum spot check). For databases, confirm the recovery model and last LSN match expectations.
Power off and delete the test VMs from the isolated lab. Leaving restored production data sitting on the recovery network is a quiet data-residency and access-control violation that auditors find on the next walkthrough.
Documentation and Reporting
Record the restore drill date, scope, RTO measured, and any remediation in the client's documentation platform. vCIO will pull from this for the QBR; auditors will pull from this for SOC 2 evidence.
Export the job log and restore drill record into the GRC tool (Vanta, Drata, Tugboat) for the backup and BCP control families (CC9.1, CC7.5). Missing evidence at audit time, not failed backups, is the typical SOC 2 finding.
Use this template in Manifestly
- Cloud Migration Checklist
- Cloud Security Checklist
- User Access Review Checklist
- Data Recovery Checklist
- Containerization Rollout Checklist
- Database Backup Checklist
- Password Management Checklist
- Network Upgrade Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Problem Management Checklist
- Server Decommissioning Checklist
- Cloud Monitoring Checklist
- Hardware Inventory Checklist
- IT Regulatory Compliance Review
- Release Management Checklist
- Server Maintenance Checklist
- Rollback Plan Checklist
- Customer Support Ticket Workflow
- Software Upgrade Checklist
- Quarterly Compliance Reporting Checklist
- Patch Management Checklist
- Hardware Maintenance Checklist
- Server Security Checklist
- IT Emergency Response Checklist
- Incident Management Checklist
- Disaster Recovery Plan Checklist
- User Role Management Checklist
- Software Installation Checklist
- Compliance Audit Checklist
- Access Control Checklist
- Cloud Cost Management Checklist
- IT Staff Performance Review
- Change Management Checklist
- Firewall Configuration Checklist
- Security Audit Checklist
- Quarterly Network Security Review
- Database Migration Checklist
- Employee Onboarding Checklist
- Capacity Planning Checklist
- IT Budgeting Checklist
- Network Monitoring Checklist
- Cloud Deployment Checklist
- Database Installation Checklist
- IT Service Request Checklist
- Database Security Checklist
- System Monitoring Checklist
- Hardware Troubleshooting Checklist
- IT Strategy Checklist
- Patch Deployment Checklist
- Hardware Upgrade Checklist
- Performance Tuning Checklist
- Application Performance Monitoring Checklist
- Employee Training Checklist
- User Onboarding Checklist
- IT Vendor Management Checklist
- Server Build and Hardening Checklist
- IT Policy Review Checklist
- Help Desk Ticket Handling Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Network Troubleshooting Checklist
- User Offboarding Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- Data Backup Verification Checklist
- Disaster Recovery Plan Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Business Continuity Checklist
- Data Recovery Checklist
- Database Backup Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- E-commerce Backup and Recovery Checklist
- Backup and Recovery Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
