Data Backup Verification Checklist
Backup Coverage and Configuration Review
Pull the current asset list from the RMM or CMDB (NinjaOne, Datto RMM, IT Glue) and cross-check against the backup console (Veeam, Datto, Rubrik, AWS Backup). Flag any production VMs, endpoints, or M365/Google Workspace tenants that are not enrolled in a backup job.
Verify three copies of data, on two media types, with at least one offsite copy. For cloud-native workloads confirm cross-region replication (AWS Backup vault, Azure Backup GRS) plus an immutable copy.
Match each system's backup frequency to its documented RPO and the restore SLA to its RTO. Tier-1 systems (production DB, identity provider) typically need RPO ≤ 1 hour; tier-3 (developer workstations) may tolerate 24 hours.
Confirm GFS retention (daily 14, weekly 8, monthly 12, yearly 7) aligns with policy and compliance minimums — SOC 2, HIPAA, and PCI-DSS each have log/data retention floors that override default vendor settings.
Job Health and Encryption
Export the job report from Veeam One, Datto Status, or Rubrik Polaris. Investigate any job with a success rate below 98% — repeated VSS errors and stale agents are the usual culprits.
Create a P2 ticket in the PSA (ConnectWise, HaloPSA, Jira Service Management) for each failing job with the asset name, last successful run, error code, and assigned owner. Do not close until two consecutive successful runs are confirmed.
Confirm AES-256 at rest on the backup repository and TLS 1.2+ in transit. For cloud vaults, confirm KMS or customer-managed keys are in use, not vendor-default keys.
Verify object lock (S3 Object Lock, Azure Blob immutable storage) or hardened repository is enforced on at least one copy. Ransomware actors target backup consoles first — an admin-deletable copy is not a recovery copy.
Pull the backup service account credential from the vault (HashiCorp Vault, Azure Key Vault, 1Password Secrets) and confirm it has rotated within policy. Service accounts skipped from rotation are a classic finding in SOC 2 and ISO 27001 audits.
Restore Testing
Pick one file-level, one VM-level, and one SaaS object (M365 mailbox or SharePoint site) at random. Spot-restores catch silent corruption that job-success metrics miss.
Restore to a quarantine share, never overwriting production. Verify file hash matches the source where possible.
Boot the VM in an isolated network (Veeam SureBackup, Datto Screenshot Verification, Rubrik Live Mount). Confirm OS boots, services start, and application heartbeat responds.
Walk through the documented runbook end to end with the IR commander, infra lead, and an exec sponsor as observer. Tabletop without legal/comms invited is a common gap — discovering the press-contact gap during a real outage is the worst possible time.
Remediation and Escalation
Page the on-call engineering lead via PagerDuty or Opsgenie when a restore fails. A failed restore is a P1 finding — the system has no recovery path until it's fixed and re-tested.
If the primary repository is corrupt, initiate seed restore from the offsite/immutable copy. Document the chain-of-custody for any copy moved across regions or accounts.
Force a fresh full backup, then a follow-up incremental, and confirm both succeed before closing the incident ticket.
Capacity and Documentation
Pull capacity metrics from the backup console. Project 90-day growth and order capacity now if free space drops below 20% — repository fill is the #1 cause of silent backup-job failure.
Expire backups past retention except those under legal hold. Confirm with GRC or legal that no active hold blocks disposal before purging.
Refresh the backup runbook in IT Glue, Hudu, or Confluence with any changes from this cycle — new asset coverage, schedule changes, repository targets, contact escalation paths.
Upload job reports, restore evidence, and capacity snapshots to the GRC platform (Vanta, Drata, Secureframe). Continuous evidence collection beats scrambling at audit time — auditors flag the gaps in last-minute submissions.
Use this template in Manifestly
- Cloud Outage Response
- Vulnerability Intake Checklist
- Network Maintenance Checklist
- Disaster Recovery Checklist
- Server Maintenance Checklist
- Software Installation Checklist
- Onboarding a New Software Developer
- Patch Management Checklist
- Server Configuration Checklist
- Software Update Checklist
- Performance Monitoring Checklist
- Incident Response Checklist
- Quarterly Security Review Checklist
- User Access Control Checklist
- Monthly Server Maintenance Checklist
- Monthly Server Maintenance Checklist
- Desktop Configuration Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- Disaster Recovery Plan Checklist
- Data Backup and Recovery Checklist
- Data Backup and Recovery Checklist
- Business Continuity Checklist
- Data Recovery Checklist
- Database Backup Checklist
- Backup and Restore Checklist
- Server Backup Checklist
- Business Continuity Plan Checklist
- Disaster Recovery Plan Checklist
- Disaster Recovery Checklist
- E-commerce Backup and Recovery Checklist
- Backup and Recovery Checklist
Ready to take control of your recurring tasks?
Start Free 14-Day TrialUse Slack? Sign up with one click
