Business Continuity Plan Checklist
Annual BCP/DR review run by IT operations to validate that critical systems, recovery objectives, and incident playbooks still match the business. Covers BIA refresh, DR testing, incident response, alternate-site readiness, and post-test remediation.
Business Impact Analysis
-
Refresh the critical systems inventory
Pull the current asset list from the RMM (NinjaOne, Datto, ConnectWise Automate) and reconcile against IT Glue or Hudu. Tag each system Tier 0 (identity, DNS, DHCP, AD/Entra), Tier 1 (line-of-business apps), or Tier 2 (supporting). Stale inventories are the most common reason DR tests fail — a system gets restored that nobody uses anymore while a new SaaS dependency goes unrecovered.
Collects file -
Set RTO and RPO per Tier 0 system
Work with system owners to confirm Recovery Time Objective and Recovery Point Objective for each Tier 0 and Tier 1 system. RTOs that the current backup posture cannot meet should be flagged for remediation rather than quietly accepted.
-
Map upstream and downstream dependencies
Document SaaS, API, and on-prem dependencies for each critical app — SSO (Entra ID, Okta), DNS, certificate authority, payment processor, MX provider. A DR plan that restores the app but not its identity provider doesn't restore service.
-
Identify single points of failure
Look for the classics: one DC running both DNS and DHCP, a single hypervisor host with no HA peer, an MFA service account with hardcoded credentials, a backup target writable from the production AD account.
Backup and Recovery Posture
-
Verify 3-2-1 with an immutable copy
Confirm three copies on two media with one offsite, and that at least one copy is immutable — Veeam hardened repo, S3 Object Lock, write-once tape, or a separate cloud account the production AD cannot reach. Ransomware-encrypted backups are the failure mode this exists to prevent.
Collects list -
Open a remediation ticket for immutability gaps
File a P2 in the PSA (ConnectWise, Autotask, Jira Service Management) naming the affected systems, the proposed control (object lock, hardened repo, separate cloud account), and the target close date. Do not pass the BCP review with this open and undocumented.
-
Review backup job success rates over 90 days
Pull the Veeam / Datto / Cohesity / Rubrik report. A green dashboard with a job that's been silently skipping a VM for six weeks is the canonical failure pattern.
-
Validate offsite replication lag
Confirm that replication to the offsite or cloud target is keeping up with the RPO. Lag exceeding the stated RPO means the offsite copy is not what the BCP claims it is.
DR Test Execution
-
Schedule the isolated restore drill
File the change request through CAB. Build the restore target in an isolated VLAN or sandbox tenant — never restore Tier 0 systems into production for a drill.
-
Restore a Tier 0 system end-to-end
Pick AD/Entra, the file server, or the LOB database. Walk the full path: locate backup, decrypt, restore to isolated environment, validate application start, validate user authentication, validate data integrity against a known checkpoint.
-
Record the actual recovery time and data lossCollects list Collects number Collects number Collects paragraph
-
File a P1 remediation plan for missed RTO
If the drill missed RTO or RPO, the BCP is wrong — either the objective or the architecture has to change. Open the remediation ticket with named owner, target date, and the architecture change required (warm standby, additional replica, faster restore tier).
Incident Response Readiness
-
Update the on-call rotation in PagerDuty
Confirm primary, secondary, and escalation tiers in PagerDuty / Opsgenie / xMatters match current staffing. Test that a synthetic alert pages the right person.
-
Refresh the incident commander runbook
Update the SEV1 runbook in IT Glue / Hudu / Confluence: who declares, bridge call number, status page owner, executive notification, legal/PR triggers, customer communication template.
-
Verify the break-glass account works
Test the emergency-access account in Entra ID / Okta. Confirm the credentials are sealed in two physical locations, MFA is excluded per policy, and sign-in is monitored. A locked-out admin during a real outage is the worst time to discover this.
-
Run a tabletop exercise with the IR team
Walk through a ransomware scenario or a Tier 0 outage with the named IR team. Capture decision points where the runbook was unclear; those become the post-tabletop edits.
Alternate Site and Workforce Continuity
-
Confirm VPN and ZTNA capacity for full remote
Verify the FortiGate / Palo Alto / Meraki concentrator (or Cloudflare / Zscaler ZTNA) can sustain 100% of staff remote. License headroom and tunnel limits are common surprise constraints.
-
Validate the alternate communication channel
If M365 / Teams is the primary channel, the BCP needs an out-of-band fallback — Signal group, personal-email tree, mass-notification service (Everbridge, AlertMedia). Test it; don't just document it.
-
Review vendor and SaaS escalation contacts
Update the after-hours phone numbers and support tier for the ISP, M365, the EDR vendor (CrowdStrike, SentinelOne), and the backup vendor. A general support queue at 2am is not an escalation path.
Plan Sign-Off and Maintenance
-
Capture lessons learned from the drill
Run a blameless retro covering what worked, what didn't, and which runbook steps were ambiguous. Feed each item into the IT Glue / Hudu BCP doc as a tracked edit.
-
Schedule the next quarterly restore drill
-
Director of IT signs off on the BCPCollects list Collects signature Collects paragraph
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Business Continuity Plan Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.