Database Backup Checklist
Operational runbook a DBA or sysadmin runs for a scheduled database backup — pre-flight checks, backup execution against the configured tool, integrity verification, and a periodic restore drill that proves the backup is actually usable.
Pre-Backup Preparations
-
Confirm free space on the backup target
Check free capacity on the backup destination (SAN LUN, NAS share, S3 bucket, Azure Blob, or Veeam repository) against the size of the last full plus expected log growth. A common gotcha: the repo is fine but the staging volume the backup tool writes to first is not — check both.
Collects number Collects list -
Verify the database is in a consistent state
For SQL Server, run DBCC CHECKDB on the target databases or confirm the most recent run was clean. For PostgreSQL, check pg_stat_activity for long-running transactions that would bloat WAL during the backup. For MySQL/InnoDB, confirm no orphan transactions in SHOW ENGINE INNODB STATUS.
-
Notify stakeholders of the backup window
Post in the #db-ops channel and email app owners with the planned start, expected end, and any application impact (read-only, brief I/O latency, none). Skip this for routine nightly jobs; required for ad-hoc or extended-window backups.
-
Check for conflicting maintenance jobs
Index rebuilds, statistics updates, ETL loads, and replication reseeds can collide with the backup window. Review the SQL Agent / cron / Airflow schedule for the host and any downstream replicas. Patch Tuesday and quarterly DR tests are common collision points.
-
Review backup scripts and retention policy
Open the active job definition in Veeam, Commvault, Rubrik, or your scripted pg_dump/mysqldump wrapper. Confirm retention matches the documented RPO and that the immutable / object-lock copy is still configured — ransomware-resilient backup requires the offsite copy be unmodifiable from the production credential.
Backup Execution
-
Initiate the backup job
Trigger the configured job in the backup tool (Veeam, Commvault, Rubrik, native SQL Agent, pgBackRest, mysqldump). For ad-hoc runs, use the documented runbook command — do not invent flags at the prompt. Capture the job ID for the audit trail.
Collects text -
Monitor throughput and error counters
Watch the tool's live job view for read/write MB/s, retry counts, and warnings. A throughput drop usually means the source disk is under contention or the network path to the repo is saturated. Page on errors; warnings get noted in the run log.
-
Confirm transaction logs are captured
Full backups alone do not meet a sub-day RPO. Verify the log-chain job is also running on its schedule (SQL Server transaction log backups, PostgreSQL WAL archive to archive_command, MySQL binlog shipping). A broken log chain is silent until restore day.
-
Verify files landed at the expected target
Confirm the backup file or chunk set appears in the primary repo, the secondary copy, and the immutable / offsite tier (3-2-1: 3 copies, 2 media, 1 offsite). Spot-check file size against the prior night's run; a 10x size delta usually means a misconfigured filter or full vs. incremental confusion.
-
Record run timing and outcome
Log start time, end time, total bytes, dedup ratio, and the final status code in the run sheet or PSA ticket. This timing data is what feeds the next quarter's window-sizing decision.
Collects list Collects datetime Collects datetime
Post-Backup Verification
-
Run the tool's integrity check on the backup set
Run RESTORE VERIFYONLY for SQL Server, pg_verifybackup for PostgreSQL, or the equivalent SureBackup / Live Mount verification in Veeam. This catches checksum corruption that a successful job-status code can hide.
-
Perform a test restore into the isolated lab
Restore the most recent backup into the DR lab VLAN, mount the database, and run a smoke query (row count on a known table, latest timestamp on the audit log). This is the step that catches the silent failures — rotated credential the script depends on, vendor format change, encryption key the team no longer holds. Cadence: at least quarterly per the DR policy.
Collects list Collects number Collects paragraph -
Open an incident ticket for the failed restore
A failed restore drill is a P2 — the backups are not proven usable. Open a ticket in ServiceNow / Jira Service Management / ConnectWise PSA, page the on-call DBA, and do not close this run until a successful restore is demonstrated against an alternate backup point.
-
Review backup logs for warnings
Skim the job log for VSS writer failures, snapshot quiesce timeouts, deduplication errors, or skipped objects. Warnings that recur across runs become tomorrow's failed restore — file a low-priority ticket rather than letting them accrete.
-
Update the backup register and DR runbook
Record the run in IT Glue / Hudu / Confluence with backup set ID, retention expiry, restore-drill date, and any deviations. This is the artifact a SOC 2 or HIPAA auditor asks for — without it, the controls are not demonstrable.
-
Notify stakeholders the backup is verified
Close the loop with app owners and the on-call rotation. For MSP-managed clients, push the result into the monthly QBR report so the customer has visible evidence the RPO/RTO commitments are being met.
Collects signature
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Database Backup Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.