Workflow Templates for Systems Administration: Database Backup Checklist

Pre-Backup Preparations

Confirm free space on the backup target

Check free capacity on the backup destination (SAN LUN, NAS share, S3 bucket, Azure Blob, or Veeam repository) against the size of the last full plus expected log growth. A common gotcha: the repo is fine but the staging volume the backup tool writes to first is not — check both.

Verify the database is in a consistent state

For SQL Server, run DBCC CHECKDB on the target databases or confirm the most recent run was clean. For PostgreSQL, check pg_stat_activity for long-running transactions that would bloat WAL during the backup. For MySQL/InnoDB, confirm no orphan transactions in SHOW ENGINE INNODB STATUS.

Notify stakeholders of the backup window

Post in the #db-ops channel and email app owners with the planned start, expected end, and any application impact (read-only, brief I/O latency, none). Skip this for routine nightly jobs; required for ad-hoc or extended-window backups.

Check for conflicting maintenance jobs

Index rebuilds, statistics updates, ETL loads, and replication reseeds can collide with the backup window. Review the SQL Agent / cron / Airflow schedule for the host and any downstream replicas. Patch Tuesday and quarterly DR tests are common collision points.

Review backup scripts and retention policy

Open the active job definition in Veeam, Commvault, Rubrik, or your scripted pg_dump/mysqldump wrapper. Confirm retention matches the documented RPO and that the immutable / object-lock copy is still configured — ransomware-resilient backup requires the offsite copy be unmodifiable from the production credential.

Backup Execution

Initiate the backup job

Trigger the configured job in the backup tool (Veeam, Commvault, Rubrik, native SQL Agent, pgBackRest, mysqldump). For ad-hoc runs, use the documented runbook command — do not invent flags at the prompt. Capture the job ID for the audit trail.

Monitor throughput and error counters

Watch the tool's live job view for read/write MB/s, retry counts, and warnings. A throughput drop usually means the source disk is under contention or the network path to the repo is saturated. Page on errors; warnings get noted in the run log.

Confirm transaction logs are captured

Full backups alone do not meet a sub-day RPO. Verify the log-chain job is also running on its schedule (SQL Server transaction log backups, PostgreSQL WAL archive to archive_command, MySQL binlog shipping). A broken log chain is silent until restore day.

Verify files landed at the expected target

Confirm the backup file or chunk set appears in the primary repo, the secondary copy, and the immutable / offsite tier (3-2-1: 3 copies, 2 media, 1 offsite). Spot-check file size against the prior night's run; a 10x size delta usually means a misconfigured filter or full vs. incremental confusion.

Record run timing and outcome

Log start time, end time, total bytes, dedup ratio, and the final status code in the run sheet or PSA ticket. This timing data is what feeds the next quarter's window-sizing decision.

Post-Backup Verification

Run the tool's integrity check on the backup set

Run RESTORE VERIFYONLY for SQL Server, pg_verifybackup for PostgreSQL, or the equivalent SureBackup / Live Mount verification in Veeam. This catches checksum corruption that a successful job-status code can hide.

Perform a test restore into the isolated lab

Restore the most recent backup into the DR lab VLAN, mount the database, and run a smoke query (row count on a known table, latest timestamp on the audit log). This is the step that catches the silent failures — rotated credential the script depends on, vendor format change, encryption key the team no longer holds. Cadence: at least quarterly per the DR policy.

Open an incident ticket for the failed restore

A failed restore drill is a P2 — the backups are not proven usable. Open a ticket in ServiceNow / Jira Service Management / ConnectWise PSA, page the on-call DBA, and do not close this run until a successful restore is demonstrated against an alternate backup point.

Review backup logs for warnings

Skim the job log for VSS writer failures, snapshot quiesce timeouts, deduplication errors, or skipped objects. Warnings that recur across runs become tomorrow's failed restore — file a low-priority ticket rather than letting them accrete.

Update the backup register and DR runbook

Record the run in IT Glue / Hudu / Confluence with backup set ID, retention expiry, restore-drill date, and any deviations. This is the artifact a SOC 2 or HIPAA auditor asks for — without it, the controls are not demonstrable.

Notify stakeholders the backup is verified

Close the loop with app owners and the on-call rotation. For MSP-managed clients, push the result into the monthly QBR report so the customer has visible evidence the RPO/RTO commitments are being met.

Template Library Home

Database Backup Checklist

Pre-Backup Preparations

Backup Execution

Post-Backup Verification

Use this template in Manifestly

Template Library Home

Database Backup Checklist

Pre-Backup Preparations

Backup Execution

Post-Backup Verification

Use this template in Manifestly

Ready to take control of your recurring tasks?