Business Continuity Plan Checklist

Business Impact Analysis

    Pull the current asset list from the RMM (NinjaOne, Datto, ConnectWise Automate) and reconcile against IT Glue or Hudu. Tag each system Tier 0 (identity, DNS, DHCP, AD/Entra), Tier 1 (line-of-business apps), or Tier 2 (supporting). Stale inventories are the most common reason DR tests fail — a system gets restored that nobody uses anymore while a new SaaS dependency goes unrecovered.

    Work with system owners to confirm Recovery Time Objective and Recovery Point Objective for each Tier 0 and Tier 1 system. RTOs that the current backup posture cannot meet should be flagged for remediation rather than quietly accepted.

    Document SaaS, API, and on-prem dependencies for each critical app — SSO (Entra ID, Okta), DNS, certificate authority, payment processor, MX provider. A DR plan that restores the app but not its identity provider doesn't restore service.

    Look for the classics: one DC running both DNS and DHCP, a single hypervisor host with no HA peer, an MFA service account with hardcoded credentials, a backup target writable from the production AD account.

Backup and Recovery Posture

    Confirm three copies on two media with one offsite, and that at least one copy is immutable — Veeam hardened repo, S3 Object Lock, write-once tape, or a separate cloud account the production AD cannot reach. Ransomware-encrypted backups are the failure mode this exists to prevent.

    File a P2 in the PSA (ConnectWise, Autotask, Jira Service Management) naming the affected systems, the proposed control (object lock, hardened repo, separate cloud account), and the target close date. Do not pass the BCP review with this open and undocumented.

    Pull the Veeam / Datto / Cohesity / Rubrik report. A green dashboard with a job that's been silently skipping a VM for six weeks is the canonical failure pattern.

    Confirm that replication to the offsite or cloud target is keeping up with the RPO. Lag exceeding the stated RPO means the offsite copy is not what the BCP claims it is.

DR Test Execution

    File the change request through CAB. Build the restore target in an isolated VLAN or sandbox tenant — never restore Tier 0 systems into production for a drill.

    Pick AD/Entra, the file server, or the LOB database. Walk the full path: locate backup, decrypt, restore to isolated environment, validate application start, validate user authentication, validate data integrity against a known checkpoint.

    If the drill missed RTO or RPO, the BCP is wrong — either the objective or the architecture has to change. Open the remediation ticket with named owner, target date, and the architecture change required (warm standby, additional replica, faster restore tier).

Incident Response Readiness

    Confirm primary, secondary, and escalation tiers in PagerDuty / Opsgenie / xMatters match current staffing. Test that a synthetic alert pages the right person.

    Update the SEV1 runbook in IT Glue / Hudu / Confluence: who declares, bridge call number, status page owner, executive notification, legal/PR triggers, customer communication template.

    Test the emergency-access account in Entra ID / Okta. Confirm the credentials are sealed in two physical locations, MFA is excluded per policy, and sign-in is monitored. A locked-out admin during a real outage is the worst time to discover this.

    Walk through a ransomware scenario or a Tier 0 outage with the named IR team. Capture decision points where the runbook was unclear; those become the post-tabletop edits.

Alternate Site and Workforce Continuity

    Verify the FortiGate / Palo Alto / Meraki concentrator (or Cloudflare / Zscaler ZTNA) can sustain 100% of staff remote. License headroom and tunnel limits are common surprise constraints.

    If M365 / Teams is the primary channel, the BCP needs an out-of-band fallback — Signal group, personal-email tree, mass-notification service (Everbridge, AlertMedia). Test it; don't just document it.

    Update the after-hours phone numbers and support tier for the ISP, M365, the EDR vendor (CrowdStrike, SentinelOne), and the backup vendor. A general support queue at 2am is not an escalation path.

Plan Sign-Off and Maintenance

    Run a blameless retro covering what worked, what didn't, and which runbook steps were ambiguous. Feed each item into the IT Glue / Hudu BCP doc as a tracked edit.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Systems Administration Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack