IT Systems Maintenance Checklist
Monthly maintenance pass for the IT and OT systems that keep a manufacturing plant running — ERP, MES, shop-floor networks, historians, PLCs, and the rugged hardware on the floor. Run by the plant IT lead with input from controls engineering and the ERP/MES admins.
Network Infrastructure
-
Patch IT-side switches and firewalls
Apply firmware updates to office-side switches, firewalls, and wireless controllers during the approved Saturday maintenance window. OT-side switches (Stratix, Hirschmann, Moxa) follow a separate change-control process — do not push IT vendor firmware to controls cabinets without controls engineering sign-off.
-
Review OT network segmentation boundaries
Walk the Purdue Model boundary: Level 3 (MES, historian) should not have direct routes to Level 1 (PLCs) except through the documented DMZ. Any new firewall rule punched in the last 30 days for a vendor remote-support session should already be closed — verify in the firewall change log.
-
Monitor shop-floor traffic for anomalies
Pull the last 30 days from the OT monitoring tool (Claroty, Nozomi, or Dragos depending on site). Flag any new MAC addresses on the controls VLAN, any outbound connections from PLCs, and any SMBv1 or unencrypted FTP traffic. Most of these are integrators leaving test rigs plugged in.
-
Capture config backups for network gear
Export running-config from each switch, firewall, and wireless controller via the automation tool (RANCID, Oxidized, or vendor equivalent). Attach the dated archive — this is the artifact production needs if a switch dies during second shift.
Collects file -
Test failover on redundant ERP paths
Coordinate with production scheduling — failover testing during a live shift will pause label printing and scan transactions. Drop the primary uplink for the ERP and MES VLANs and confirm the secondary path comes up within the documented RTO. Restore primary, log results.
Software and Applications
-
Apply ERP patches in the test environment
NetSuite, Dynamics 365 BC, Epicor Kinetic, and Plex roll updates on different cadences — check the vendor release notes for any work-order, label, or tax-engine changes. Apply to TEST first, run the regression scenarios (WO release, kit pull, label print, ASN), and only then schedule production cutover.
-
Review MES error logs from the past cycle
Pull failed transactions from the MES (Plex, Tulip, MachineMetrics, FactoryLogix). The common ones: scan-to-WO mismatches, label-print timeouts, and historian writes failing on tag rename. Group by root cause — recurring failures usually trace back to a stale ECN or an unmapped tag, not the MES itself.
-
Validate ERP-to-MES integration handoffs
Push a test work order through the integration: ERP release → MES dispatch → operator scan → completion → ERP backflush. The most common silent failure is backflush working but cost variance posting to the wrong account when an item master change wasn't synced.
-
Conduct quarterly user access review
Export role assignments from ERP, MES, QMS, and the historian. Cross-check against the HR active-employee list — terminated operators with active MES logins are a recurring audit finding. Pay attention to shared-floor accounts: if the kiosk uses one badge for the whole shift, that's a control gap to flag.
Collects file -
Check PLC firmware against vendor advisories
Review CISA ICS advisories and vendor PSIRT bulletins (Rockwell, Siemens, Schneider, Mitsubishi) for the firmware versions in your asset inventory. Do not push firmware updates from this checklist — log affected assets and route through controls engineering for the next planned downtime window.
Data Management
-
Run backups of ERP and MES databases
Confirm last night's full backup completed and the transaction-log chain is unbroken. For SQL Server-backed ERP/MES, a broken log chain means point-in-time recovery is gone — catch it here, not when you need to restore.
-
Test restore from backup into sandbox
Restore last week's full + log chain into the sandbox instance. A backup that has never been restored is not a backup. Validate by running a known query (open WO count by work center) against both prod and sandbox — counts should match prod as of the backup timestamp.
Collects list -
Escalate failed restore to IT management
Open a P1 ticket with backup-system vendor support and notify the IT manager and CFO. A failed restore on the ERP/MES tier means the plant is one outage away from a multi-day data loss event — treat it like a downed line, not a routine ticket.
-
Audit master-data integrity in the item master
Run the standard item-master audit query: items with no UoM, BOMs with no routing, routings with no work center, items active in ERP but missing from MES. Most of these are launch-day leftovers; a few will be live problems blocking work-order release.
-
Verify historian retention against compliance windows
Confirm the historian (PI, Ignition, Wonderware) is retaining process data for the window your customer or regulator requires — typically 3 years for IATF 16949 traceability, 7 years for FDA 21 CFR Part 11 sites. Watch for tags silently dropping when archive disks fill.
-
Review query performance on production reports
Pull the slowest-query report from the ERP and MES databases. Production schedulers running the daily WIP report at 6 AM are the canary — if their report takes 4 minutes today and took 30 seconds a quarter ago, an index needs rebuilding or a query needs tuning.
Hardware Maintenance
-
Inspect server room and shop-floor cabinets
Walk the server room and every shop-floor IT cabinet. Look for amber alert LEDs, dust accumulation on intakes, cables draped over hot exhausts, and any cabinet someone has propped open with a folder because it runs hot. Photograph anything that needs a follow-up work order.
-
Test UPS runtime on production servers
Run the UPS self-test on every unit feeding ERP, MES, historian, and domain controllers. Batteries lose capacity gradually — a unit rated for 20 minutes that delivers 6 won't carry the plant through the time it takes the generator to come up. Manufacturer date over 4 years old is a replacement candidate even if the test passes.
Collects list -
Schedule UPS battery replacement
Open a CMMS work order for the failing UPS, order the replacement battery pack from the approved vendor, and coordinate the swap with production scheduling — UPS battery replacement on a single-feed cabinet requires planned downtime of the systems behind it.
-
Verify cooling in MDF, IDF, and shop panels
Check intake temperatures at the server room MDF, every IDF closet, and shop-floor cabinets housing managed switches. Shop-floor cabinets in welding or coating areas run hot in summer — a cabinet that's fine in February may be at 110°F in August. Trend the readings against last quarter.
-
Inspect ruggedized shop-floor terminals
Spot-check the operator kiosks, barcode scanners, and rugged tablets on the floor. Common findings: cracked touchscreens taped over, scanners with worn triggers, label printers with the wrong ribbon loaded. Sticky-note workarounds at a station mean a real fix is overdue.
-
Close out the maintenance cycle
Summarize this cycle's findings, attach the updated hardware warranty register, and route to the plant manager. Anything flagged 'Fail' or 'Pass with notes' should already have a CMMS or IT ticket open — verify before closing.
Collects list Collects paragraph Collects file
Use this template
Copy it to your account, customize the steps, and run it with your team in minutes.
Browse hundreds of free templates across every team and industry.
Back to template libraryRun IT Systems Maintenance Checklist with your team
Customize the steps, assign roles, set a schedule, and keep a complete record for every run.