Data Backup and Recovery Checklist
Monthly workflow for a manufacturing IT lead to validate backups of ERP, MES, PLM, CNC programs, and PLC ladder logic, and to run a recovery drill against documented RTO and RPO targets.
Scope and Recovery Objectives
-
Inventory critical production systems
List every system whose loss would stop production: ERP (NetSuite, Epicor Kinetic, Dynamics 365 BC), MES, PLM/CAD vault (SolidWorks PDM, Windchill), CMMS, QMS, label printers, and the CNC and PLC program repositories on the floor. A common gap is the standalone PC at the machine that holds the only copy of a tested G-code program.
-
Define RTO and RPO per system
ERP and MES typically need RTO under 4 hours and RPO under 1 hour to avoid a full shift of lost production. CAD vault and CMMS can usually accept 24-hour RPO. Document the targets so the drill later in this run has something concrete to measure against.
-
Flag ITAR or EAR-controlled technical data
If any drawings, models, or specs in the PLM vault are ITAR or EAR-controlled, backup copies inherit the same access restrictions. Cloud backup targets must be US-person-only; replication outside the US is a violation regardless of encryption.
Collects list -
Confirm regulatory retention requirements
AS9100 device history records, ISO 13485 / 21 CFR Part 11 batch records, and customer PPAP submissions carry retention obligations of 7-15 years depending on contract. Verify backup retention policy meets the longest applicable requirement.
Backup Configuration
-
Verify the 3-2-1 backup architecture
Three copies of data, on two different media, with one off-site. Confirm the off-site copy is air-gapped or immutable — a writable cloud share that ransomware can also encrypt does not count.
-
Validate the ERP database backup schedule
Confirm the ERP nightly full plus transaction-log backups are aligned with the RPO defined earlier. For NetSuite or other SaaS ERPs, validate that the third-party backup connector ran successfully — the vendor's native export is not a backup.
-
Configure PLM vault and CNC program capture
SolidWorks PDM or Windchill vault snapshots run on the schedule set by IT. CNC controllers need a separate sweep — most shops use a DNC tool or a scripted SMB pull from the controller after every program edit. Without this, a crashed Fanuc control means re-proving every program from scratch.
-
Confirm AES-256 encryption on backup targets
Backup volumes encrypted at rest with AES-256, keys stored in the IT password vault and not on the backup server itself. Customer cybersecurity questionnaires (CMMC, TISAX, NIST 800-171) ask for this specifically.
-
Apply ITAR access controls to backup copies
Restrict backup target ACLs to US-person accounts only and confirm the cloud region is GovCloud or equivalent. Document the export classification and the personnel list as part of the empowered official's records.
Execution and Verification
-
Run the scheduled full backup window
Trigger the monthly full from Veeam, Commvault, or the equivalent. Confirm production database quiesce or VSS snapshot succeeded — an inconsistent SQL backup will restore but the ERP will refuse to start.
-
Review job logs and off-site replication
Walk every job in the console. Warnings are not successes — a job that completed with skipped files often means an open file lock on a CAD workstation. Confirm the off-site replication chain caught up before the next nightly window.
Collects list Collects file -
Attach verification evidence to the runbook
Auditors for ISO 27001, SOC 2, or customer cybersecurity reviews want dated screenshots of the backup console showing job status and capacity headroom. Save them where the next audit prep run can find them without a hunt.
Recovery Drill
-
Restore the ERP database to sandbox
Pull the most recent full plus log chain into the sandbox instance. Log in, run a known query (recent shop orders, last cycle count), and confirm the data matches production. A backup that restores but won't open is a common ransomware-recovery failure mode.
-
Restore a sample CNC program to a cell
Pick one Haas or Fanuc program at random, restore it to the controller via DNC or USB, and have the operator dry-run it against the proven setup sheet. This is the only way to catch silent corruption in the program backup pipeline.
-
Measure actual RTO against target
Stopwatch from restore start to verified application login. Compare to the RTO set in step one. A drill where the team beats RTO by hours is suspect — usually the dataset was small or the network was idle. Repeat with realistic load before signing off.
Collects number -
Record the drill outcome
Pass means every restored system met its RTO and RPO and the application opened cleanly. Anything else is a fail — including the case where one CNC program was unreadable. Be strict; quiet partials are how DR programs decay.
Collects list
Remediation and Closeout
-
Open a corrective action for the failed restore
Quality opens the CAR in the QMS with a containment action (manual snapshot until automated restore is fixed) and a target close date. Effectiveness verification means a clean drill in the following month — not just a re-run of the same scenario.
-
Update the DR runbook with findings
Capture every undocumented step the team had to figure out live — service account passwords, license server IPs, vendor support numbers. The runbook should let a new IT lead execute recovery during a 2 AM ransomware event.
-
Brief the plant manager and IT lead
Fifteen-minute readout: drill outcome, actual RTO vs target, open CARs, and any capacity or licensing issues that affect next month's run. Plant manager owns the production-impact call; IT lead owns the technical remediation.
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun Data Backup and Recovery Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.