Disaster Recovery Plan Checklist
Preparation and Planning
Pull the application inventory from the CMDB or RMM and tag each system with its RTO, RPO, and dependency tree. Common gotcha: shadow SaaS tools the business owns directly never make it onto the DR list, then become the loudest complaint during an outage. Attach the spreadsheet or export.
Verify each DR role has a primary and backup named — incident commander, comms lead, infrastructure lead, identity lead, vendor liaison. Cross-check PagerDuty or Opsgenie schedules for gaps. People leave; rosters drift.
Personal phone numbers, SMS group, Signal channel, and a non-corporate email — anything that does not depend on M365 or the corporate network being up. The first failure mode of in-band comms is that the platform itself is the incident.
Walk the BIA with department leads to recalibrate revenue impact per hour, regulatory exposure (HIPAA, PCI, SOX), and customer-facing SLAs. Numbers older than 12 months are stale.
Confirm active contracts with the backup vendor (Veeam, Datto, Rubrik), the firewall vendor, the ISP, and any colo or DR-as-a-Service provider. Capture support phone numbers, account IDs, and named escalation contacts in the runbook — not in someone's inbox.
Backup and Recovery Verification
Open Veeam, Datto, or whichever backup platform is in use and confirm last-30-day success rate per job. Investigate every yellow and red. A green dashboard with quietly failing jobs is the most common DR failure mode.
Three copies, two media, one offsite — and at least one immutable (object lock, write-once tape, or a separate cloud account that production credentials cannot reach). Ransomware that finds the backup share encrypts the backups too; immutability is the only durable defense.
Pick one tier-1 system and restore it end-to-end into a network-isolated recovery VLAN — VM, application, database, dependencies. Time the restore against documented RTO. The point is not that the backup file exists; the point is that the system boots and the data is consistent.
Triggered when the restore drill failed or missed RTO. File a P1 with the vendor referencing the job ID and restore log. Do not close this loop until the drill passes — a quarterly check-the-box drill that fails and is shrugged off is worse than no drill, because it manufactures false confidence.
Walk the asset inventory against actual deployed counts — VMs in vCenter, endpoints in Intune or JAMF, M365 license seats, EDR agents. Vendor audit (Microsoft, Oracle, VMware) finding 80 unlicensed VMs during a recovery is a six-figure surprise on top of the disaster.
Emergency Response Readiness
Wallet-sized card with the IT incident hotline, out-of-band channel address, and the first three things to do (do not power down, do not reconnect, call the hotline). Email-only distribution fails when email is the incident.
Send a test message to the Signal or SMS group and confirm receipt from each DR team member. Channels that nobody has opened in six months are channels nobody will see during an incident.
Walk a realistic scenario: domain admin credential compromised, backup share encrypted, EDR alerts arriving at 2am Saturday. Force decisions on isolation, comms, and ransom posture. Capture every place the playbook hits an unanswered question.
Triggered when the tabletop surfaces gaps. File each gap as a tracked action item with a named owner and due date. Schedule role-specific training — KnowBe4 for end users, vendor sessions for tier-2 engineers, an updated runbook walkthrough for the IC pool.
For physical-site disasters (fire, flood, extended power loss), confirm building security, fire department non-emergency line, and utility-provider account numbers are in the runbook. Cyber incidents: confirm FBI field office and CISA reporting paths.
Business Continuity
Per-app runbook entries: failover trigger, DNS or load-balancer change, dependency order, validation tests, rollback plan. Procedures that live in one engineer's head are single points of failure.
Validate that the FortiGate or Palo Alto concentrator and the ZTNA broker can handle the full workforce concurrently — March 2020 caught everyone with VPN sized for 20% of headcount. Confirm conditional access policies still block legacy auth.
Hot site, warm site, or DRaaS — confirm circuits up, replication current, AD/DNS reachable, and at least one technician knows how to badge in. An alternate site nobody has visited in a year is theoretical.
Pre-approved template for executive, customer, and regulator updates so comms during an incident are not drafted from scratch under pressure. Include placeholders for impact, ETA, workaround, and next-update time.
Walk the test results, gaps, and remediation plan with the vCIO or CIO. Tie outstanding items to budget — DR investments routinely lose to feature work unless leadership is forced to choose explicitly.
Post-Incident Recovery
Run the severity rubric: systems affected, users affected, data exposure, regulatory triggers (HIPAA breach, PCI cardholder data, GDPR personal data). Severity drives notification clocks — HIPAA is 60 days, GDPR is 72 hours, state laws vary.
Triggered for catastrophic incidents only. Cut DNS and traffic to the DR site per the documented failover runbook, in dependency order. Confirm AD, DNS, and identity providers come up first; application tier follows. Notify the vendor and DRaaS provider in parallel.
Work the inventory in RTO order: tier-1 first (revenue, safety, regulated), then tier-2, then tier-3. Resist scope creep from loud-but-low-priority requesters; the BIA is the tiebreaker.
Use the pre-approved status template. Send updates every 60 minutes for major or catastrophic events, every 4 hours for moderate. Always include next-update time, even if the substantive update is 'no change.' Silence is what generates the executive escalation.
Hold a blameless postmortem within five business days of recovery. Capture what actually happened versus what the playbook said would happen. Every gap becomes a tracked runbook edit with a named owner — otherwise the next incident reproduces this one.
Use this template in Manifestly
- Cloud Migration Checklist
- Cloud Security Checklist
- User Access Review Checklist
- Data Recovery Checklist
- Containerization Rollout Checklist
- Database Backup Checklist
- Password Management Checklist
- Backup and Restore Checklist
- Network Upgrade Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Problem Management Checklist
- Server Decommissioning Checklist
- Cloud Monitoring Checklist
- Hardware Inventory Checklist
- IT Regulatory Compliance Review
- Release Management Checklist
- Server Maintenance Checklist
- Rollback Plan Checklist
- Customer Support Ticket Workflow
- Software Upgrade Checklist
- Quarterly Compliance Reporting Checklist
- Patch Management Checklist
- Hardware Maintenance Checklist
- Server Security Checklist
- IT Emergency Response Checklist
- Incident Management Checklist
- User Role Management Checklist
- Software Installation Checklist
- Compliance Audit Checklist
- Access Control Checklist
- Cloud Cost Management Checklist
- IT Staff Performance Review
- Change Management Checklist
- Firewall Configuration Checklist
- Security Audit Checklist
- Quarterly Network Security Review
- Database Migration Checklist
- Employee Onboarding Checklist
- Capacity Planning Checklist
- IT Budgeting Checklist
- Network Monitoring Checklist
- Cloud Deployment Checklist
- Database Installation Checklist
- IT Service Request Checklist
- Database Security Checklist
- System Monitoring Checklist
- Hardware Troubleshooting Checklist
- IT Strategy Checklist
- Patch Deployment Checklist
- Hardware Upgrade Checklist
- Performance Tuning Checklist
- Application Performance Monitoring Checklist
- Employee Training Checklist
- User Onboarding Checklist
- IT Vendor Management Checklist
- Server Build and Hardening Checklist
- IT Policy Review Checklist
- Help Desk Ticket Handling Checklist
- Infrastructure as Code Checklist
- Hardware Disposal Checklist
- IT Resource Allocation Checklist
- Incident Response Checklist
- Network Troubleshooting Checklist
- User Offboarding Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- Data Backup Verification Checklist
- Disaster Recovery Plan Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Business Continuity Checklist
- Data Recovery Checklist
- Database Backup Checklist
- Backup and Restore Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Disaster Recovery Checklist
- E-commerce Backup and Recovery Checklist
- Backup and Recovery Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- Business Continuity Checklist
- Business Continuity Planning Checklist
- Business Continuity Plan Checklist
- Business Continuity Checklist
- Disaster Recovery Checklist
- Business Continuity Planning Checklist
- Restaurant Technology Backup Checklist
- Business Continuity Planning Checklist
- Business Continuity Planning Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
