Server Backup Checklist
Weekly server backup runbook covering pre-backup planning, job configuration, execution monitoring, and restore verification. Run by the sysadmin or MSP backup engineer responsible for the protected workloads.
Pre-Backup Planning
-
Inventory protected workloads and RPO/RTO
List every protected workload — VMs, file servers, SQL/Oracle DBs, M365 tenants, application servers — with its RPO and RTO. Cross-check against the CMDB or IT Glue; orphaned workloads (test VMs that became prod) are the usual gap. Flag anything new since the last cycle.
Collects file -
Confirm 3-2-1 coverage per workload
Three copies, two media types, one offsite — and at least one immutable or air-gapped copy (object lock, hardened Linux repo, LTO, separate cloud account). Backups writable from production are the ransomware failure mode.
-
Verify retention against compliance policy
Match retention to HIPAA, PCI DSS, SOX, or SOC 2 requirements applicable to the data class. Long-term archive (Glacier, Coldline, LTO) costs differ; document the tier per workload so the next true-up isn't a surprise.
-
Review storage capacity and headroom
Check Veeam/Datto/Rubrik repository utilization and forecast 90-day growth. Below 20% headroom triggers a capacity ticket — backup jobs that fail mid-run because the repo filled are the most common P2 in this workflow.
Backup Job Configuration
-
Validate backup software version and patches
Confirm Veeam, Datto, Commvault, Cohesity, or Rubrik is on a supported version with current CVE patches applied. Backup servers are a high-value attacker target — out-of-date repos are how ransomware reaches the immutable copy.
-
Configure full, incremental, and synthetic-full jobs
Set the GFS chain (weekly full, daily incremental, monthly synthetic full) to fit the backup window. Long incremental chains without a synthetic full inflate restore time — verify the chain length against the workload's RTO.
-
Set repository targets and offsite copy job
Primary on-prem repo, secondary repo on different media, offsite copy to S3/Azure Blob with object lock or to a tenant-isolated MSP cloud. Confirm the offsite copy job is chained, not optional, and that the storage account uses a separate credential boundary.
-
Wire alerts to PagerDuty and the PSA
Job failures route to PagerDuty / Opsgenie at P2; warnings open a ticket in ConnectWise PSA or ServiceNow. Email-only alerting goes unread — a job that's been failing for three weeks because nobody reads the digest is the canonical gotcha here.
Execution and Monitoring
-
Run the scheduled backup window
Kick off jobs inside the maintenance window so production I/O isn't impacted. For VMware, confirm CBT (changed block tracking) is healthy; reset CBT on any VM that's been failing incrementals.
-
Review job results and capture status
Check the backup console after the window closes. Record the result for this cycle — green means every job in the protection set succeeded with no warnings.
Collects list -
Triage failed jobs and open tickets
For each failure: capture the job log, identify the proximate cause (VSS writer, snapshot stuck, credential rotation, agent offline), and open a ticket with named owner and SLA. Do not re-run the job blindly — VSS-related failures often repeat until the source-side issue is fixed.
Collects paragraph -
Confirm offsite copy replication caught up
Local backup green doesn't mean the offsite copy completed — bandwidth or repo issues stall the copy job silently. Verify the latest restore point exists in the offsite tier with the expected timestamp.
Restore Verification
-
Run a sandboxed test restore
Use Veeam SureBackup, Datto's screenshot verification, or a manual mount into an isolated network. The point is to prove the restore path end-to-end — credentials, encryption keys, network — not just that the file opens.
Collects list -
Escalate to backup engineer for failed restore
A failed test restore is a P1 — the backup is not actually a backup. Page the senior backup engineer, open an incident, and notify the service owner. Do not move on until the restore path is validated against an alternate restore point.
-
Verify encryption keys are escrowed
Confirm BitLocker recovery keys, repository encryption passwords, and KMS key references are escrowed in the password vault (Keeper, 1Password, Hudu) — not only on the backup server. Encrypted backups with lost keys are the worst-case outcome of a real disaster.
-
Sign off the cycle and update the runbook
Record sign-off, attach the verification evidence, and note any runbook changes (new workload added, retention adjusted, repo migrated). The signed cycle is the SOC 2 / HIPAA evidence that the backup control operates as designed.
Collects list Collects signature Collects paragraph