Server Backup Checklist

Weekly server backup runbook covering pre-backup planning, job configuration, execution monitoring, and restore verification. Run by the sysadmin or MSP backup engineer responsible for the protected workloads.

Use this workflow

Pre-Backup Planning

Inventory protected workloads and RPO/RTO
- List every protected workload — VMs, file servers, SQL/Oracle DBs, M365 tenants, application servers — with its RPO and RTO. Cross-check against the CMDB or IT Glue; orphaned workloads (test VMs that became prod) are the usual gap. Flag anything new since the last cycle.
Collects file
Confirm 3-2-1 coverage per workload
- Three copies, two media types, one offsite — and at least one immutable or air-gapped copy (object lock, hardened Linux repo, LTO, separate cloud account). Backups writable from production are the ransomware failure mode.
Verify retention against compliance policy
- Match retention to HIPAA, PCI DSS, SOX, or SOC 2 requirements applicable to the data class. Long-term archive (Glacier, Coldline, LTO) costs differ; document the tier per workload so the next true-up isn't a surprise.
Review storage capacity and headroom
- Check Veeam/Datto/Rubrik repository utilization and forecast 90-day growth. Below 20% headroom triggers a capacity ticket — backup jobs that fail mid-run because the repo filled are the most common P2 in this workflow.

Backup Job Configuration

Validate backup software version and patches
- Confirm Veeam, Datto, Commvault, Cohesity, or Rubrik is on a supported version with current CVE patches applied. Backup servers are a high-value attacker target — out-of-date repos are how ransomware reaches the immutable copy.
Configure full, incremental, and synthetic-full jobs
- Set the GFS chain (weekly full, daily incremental, monthly synthetic full) to fit the backup window. Long incremental chains without a synthetic full inflate restore time — verify the chain length against the workload's RTO.
Set repository targets and offsite copy job
- Primary on-prem repo, secondary repo on different media, offsite copy to S3/Azure Blob with object lock or to a tenant-isolated MSP cloud. Confirm the offsite copy job is chained, not optional, and that the storage account uses a separate credential boundary.
Wire alerts to PagerDuty and the PSA
- Job failures route to PagerDuty / Opsgenie at P2; warnings open a ticket in ConnectWise PSA or ServiceNow. Email-only alerting goes unread — a job that's been failing for three weeks because nobody reads the digest is the canonical gotcha here.

Execution and Monitoring

Run the scheduled backup window
- Kick off jobs inside the maintenance window so production I/O isn't impacted. For VMware, confirm CBT (changed block tracking) is healthy; reset CBT on any VM that's been failing incrementals.
Review job results and capture status
- Check the backup console after the window closes. Record the result for this cycle — green means every job in the protection set succeeded with no warnings.
Collects list
Triage failed jobs and open tickets
- For each failure: capture the job log, identify the proximate cause (VSS writer, snapshot stuck, credential rotation, agent offline), and open a ticket with named owner and SLA. Do not re-run the job blindly — VSS-related failures often repeat until the source-side issue is fixed.
Collects paragraph
Confirm offsite copy replication caught up
- Local backup green doesn't mean the offsite copy completed — bandwidth or repo issues stall the copy job silently. Verify the latest restore point exists in the offsite tier with the expected timestamp.

Restore Verification

Run a sandboxed test restore
- Use Veeam SureBackup, Datto's screenshot verification, or a manual mount into an isolated network. The point is to prove the restore path end-to-end — credentials, encryption keys, network — not just that the file opens.
Collects list
Escalate to backup engineer for failed restore
- A failed test restore is a P1 — the backup is not actually a backup. Page the senior backup engineer, open an incident, and notify the service owner. Do not move on until the restore path is validated against an alternate restore point.
Verify encryption keys are escrowed
- Confirm BitLocker recovery keys, repository encryption passwords, and KMS key references are escrowed in the password vault (Keeper, 1Password, Hudu) — not only on the backup server. Encrypted backups with lost keys are the worst-case outcome of a real disaster.
Sign off the cycle and update the runbook
- Record sign-off, attach the verification evidence, and note any runbook changes (new workload added, retention adjusted, repo migrated). The signed cycle is the SOC 2 / HIPAA evidence that the backup control operates as designed.
Collects list Collects signature Collects paragraph