Data Backup Verification Checklist

Backup Coverage and Configuration Review

    Pull the current asset list from the RMM or CMDB (NinjaOne, Datto RMM, IT Glue) and cross-check against the backup console (Veeam, Datto, Rubrik, AWS Backup). Flag any production VMs, endpoints, or M365/Google Workspace tenants that are not enrolled in a backup job.

    Verify three copies of data, on two media types, with at least one offsite copy. For cloud-native workloads confirm cross-region replication (AWS Backup vault, Azure Backup GRS) plus an immutable copy.

    Match each system's backup frequency to its documented RPO and the restore SLA to its RTO. Tier-1 systems (production DB, identity provider) typically need RPO ≤ 1 hour; tier-3 (developer workstations) may tolerate 24 hours.

    Confirm GFS retention (daily 14, weekly 8, monthly 12, yearly 7) aligns with policy and compliance minimums — SOC 2, HIPAA, and PCI-DSS each have log/data retention floors that override default vendor settings.

Job Health and Encryption

    Export the job report from Veeam One, Datto Status, or Rubrik Polaris. Investigate any job with a success rate below 98% — repeated VSS errors and stale agents are the usual culprits.

    Create a P2 ticket in the PSA (ConnectWise, HaloPSA, Jira Service Management) for each failing job with the asset name, last successful run, error code, and assigned owner. Do not close until two consecutive successful runs are confirmed.

    Confirm AES-256 at rest on the backup repository and TLS 1.2+ in transit. For cloud vaults, confirm KMS or customer-managed keys are in use, not vendor-default keys.

    Verify object lock (S3 Object Lock, Azure Blob immutable storage) or hardened repository is enforced on at least one copy. Ransomware actors target backup consoles first — an admin-deletable copy is not a recovery copy.

    Pull the backup service account credential from the vault (HashiCorp Vault, Azure Key Vault, 1Password Secrets) and confirm it has rotated within policy. Service accounts skipped from rotation are a classic finding in SOC 2 and ISO 27001 audits.

Restore Testing

    Pick one file-level, one VM-level, and one SaaS object (M365 mailbox or SharePoint site) at random. Spot-restores catch silent corruption that job-success metrics miss.

    Restore to a quarantine share, never overwriting production. Verify file hash matches the source where possible.

    Boot the VM in an isolated network (Veeam SureBackup, Datto Screenshot Verification, Rubrik Live Mount). Confirm OS boots, services start, and application heartbeat responds.

    Walk through the documented runbook end to end with the IR commander, infra lead, and an exec sponsor as observer. Tabletop without legal/comms invited is a common gap — discovering the press-contact gap during a real outage is the worst possible time.

Remediation and Escalation

    Page the on-call engineering lead via PagerDuty or Opsgenie when a restore fails. A failed restore is a P1 finding — the system has no recovery path until it's fixed and re-tested.

    If the primary repository is corrupt, initiate seed restore from the offsite/immutable copy. Document the chain-of-custody for any copy moved across regions or accounts.

    Force a fresh full backup, then a follow-up incremental, and confirm both succeed before closing the incident ticket.

Capacity and Documentation

    Pull capacity metrics from the backup console. Project 90-day growth and order capacity now if free space drops below 20% — repository fill is the #1 cause of silent backup-job failure.

    Expire backups past retention except those under legal hold. Confirm with GRC or legal that no active hold blocks disposal before purging.

    Refresh the backup runbook in IT Glue, Hudu, or Confluence with any changes from this cycle — new asset coverage, schedule changes, repository targets, contact escalation paths.

    Upload job reports, restore evidence, and capacity snapshots to the GRC platform (Vanta, Drata, Secureframe). Continuous evidence collection beats scrambling at audit time — auditors flag the gaps in last-minute submissions.