Data Backup Verification Checklist

Recurring workflow for IT and MSP teams to verify backup health, test restores against RPO/RTO targets, and confirm 3-2-1 coverage across endpoints, servers, and SaaS data. Run monthly with a quarterly DR test embedded.

5 sections 21 steps Collects data

Backup Coverage and Configuration Review

Reconcile protected assets against the CMDB
- Pull the current asset list from the RMM or CMDB (NinjaOne, Datto RMM, IT Glue) and cross-check against the backup console (Veeam, Datto, Rubrik, AWS Backup). Flag any production VMs, endpoints, or M365/Google Workspace tenants that are not enrolled in a backup job.
Collects list
Confirm 3-2-1 coverage for critical systems
- Verify three copies of data, on two media types, with at least one offsite copy. For cloud-native workloads confirm cross-region replication (AWS Backup vault, Azure Backup GRS) plus an immutable copy.
Validate RPO and RTO targets per tier
- Match each system's backup frequency to its documented RPO and the restore SLA to its RTO. Tier-1 systems (production DB, identity provider) typically need RPO ≤ 1 hour; tier-3 (developer workstations) may tolerate 24 hours.
Review backup job schedules and retention
- Confirm GFS retention (daily 14, weekly 8, monthly 12, yearly 7) aligns with policy and compliance minimums — SOC 2, HIPAA, and PCI-DSS each have log/data retention floors that override default vendor settings.

Job Health and Encryption

Review last 30 days of backup job results
- Export the job report from Veeam One, Datto Status, or Rubrik Polaris. Investigate any job with a success rate below 98% — repeated VSS errors and stale agents are the usual culprits.
Collects list
Open tickets for failing backup jobs
- Create a P2 ticket in the PSA (ConnectWise, HaloPSA, Jira Service Management) for each failing job with the asset name, last successful run, error code, and assigned owner. Do not close until two consecutive successful runs are confirmed.
Verify encryption at rest and in transit
- Confirm AES-256 at rest on the backup repository and TLS 1.2+ in transit. For cloud vaults, confirm KMS or customer-managed keys are in use, not vendor-default keys.
Confirm immutability and air-gap controls
- Verify object lock (S3 Object Lock, Azure Blob immutable storage) or hardened repository is enforced on at least one copy. Ransomware actors target backup consoles first — an admin-deletable copy is not a recovery copy.
Rotate backup service account credentials
- Pull the backup service account credential from the vault (HashiCorp Vault, Azure Key Vault, 1Password Secrets) and confirm it has rotated within policy. Service accounts skipped from rotation are a classic finding in SOC 2 and ISO 27001 audits.

Restore Testing

Select random sample for spot-restore
- Pick one file-level, one VM-level, and one SaaS object (M365 mailbox or SharePoint site) at random. Spot-restores catch silent corruption that job-success metrics miss.
Perform file-level restore to isolated location
- Restore to a quarantine share, never overwriting production. Verify file hash matches the source where possible.
Perform VM-level instant recovery test
- Boot the VM in an isolated network (Veeam SureBackup, Datto Screenshot Verification, Rubrik Live Mount). Confirm OS boots, services start, and application heartbeat responds.
Record restore times against RTO targets

Collects list Collects paragraph Collects file
Run quarterly full DR failover tabletop
- Walk through the documented runbook end to end with the IR commander, infra lead, and an exec sponsor as observer. Tabletop without legal/comms invited is a common gap — discovering the press-contact gap during a real outage is the worst possible time.

Remediation and Escalation

Escalate restore failures to engineering lead
- Page the on-call engineering lead via PagerDuty or Opsgenie when a restore fails. A failed restore is a P1 finding — the system has no recovery path until it's fixed and re-tested.
Trigger seed reload from offsite copy
- If the primary repository is corrupt, initiate seed restore from the offsite/immutable copy. Document the chain-of-custody for any copy moved across regions or accounts.
Re-run failed jobs after remediation
- Force a fresh full backup, then a follow-up incremental, and confirm both succeed before closing the incident ticket.

Capacity and Documentation

Review repository capacity and growth
- Pull capacity metrics from the backup console. Project 90-day growth and order capacity now if free space drops below 20% — repository fill is the #1 cause of silent backup-job failure.
Collects number
Apply retention and legal-hold policies
- Expire backups past retention except those under legal hold. Confirm with GRC or legal that no active hold blocks disposal before purging.
Update runbooks and asset documentation
- Refresh the backup runbook in IT Glue, Hudu, or Confluence with any changes from this cycle — new asset coverage, schedule changes, repository targets, contact escalation paths.
File evidence package for SOC 2 or ISO audit
- Upload job reports, restore evidence, and capacity snapshots to the GRC platform (Vanta, Drata, Secureframe). Continuous evidence collection beats scrambling at audit time — auditors flag the gaps in last-minute submissions.
Collects signature Collects file

Use this template

Copy it to your account, customize the steps, and run it with your team in minutes.

Use this workflow Start free trial

Sections 5

Steps 21

Category Information Technology

Price Free to start

Need a different process

Browse hundreds of free templates across every team and industry.

Back to template library

Related templates

More workflows your team can run.

Information Technology

Run Data Backup Verification Checklist with your team

Customize the steps, assign roles, set a schedule, and keep a complete record for every run.