IT Systems Maintenance Checklist

Network Infrastructure

    Apply firmware updates to office-side switches, firewalls, and wireless controllers during the approved Saturday maintenance window. OT-side switches (Stratix, Hirschmann, Moxa) follow a separate change-control process — do not push IT vendor firmware to controls cabinets without controls engineering sign-off.

    Walk the Purdue Model boundary: Level 3 (MES, historian) should not have direct routes to Level 1 (PLCs) except through the documented DMZ. Any new firewall rule punched in the last 30 days for a vendor remote-support session should already be closed — verify in the firewall change log.

    Pull the last 30 days from the OT monitoring tool (Claroty, Nozomi, or Dragos depending on site). Flag any new MAC addresses on the controls VLAN, any outbound connections from PLCs, and any SMBv1 or unencrypted FTP traffic. Most of these are integrators leaving test rigs plugged in.

    Export running-config from each switch, firewall, and wireless controller via the automation tool (RANCID, Oxidized, or vendor equivalent). Attach the dated archive — this is the artifact production needs if a switch dies during second shift.

    Coordinate with production scheduling — failover testing during a live shift will pause label printing and scan transactions. Drop the primary uplink for the ERP and MES VLANs and confirm the secondary path comes up within the documented RTO. Restore primary, log results.

Software and Applications

    NetSuite, Dynamics 365 BC, Epicor Kinetic, and Plex roll updates on different cadences — check the vendor release notes for any work-order, label, or tax-engine changes. Apply to TEST first, run the regression scenarios (WO release, kit pull, label print, ASN), and only then schedule production cutover.

    Pull failed transactions from the MES (Plex, Tulip, MachineMetrics, FactoryLogix). The common ones: scan-to-WO mismatches, label-print timeouts, and historian writes failing on tag rename. Group by root cause — recurring failures usually trace back to a stale ECN or an unmapped tag, not the MES itself.

    Push a test work order through the integration: ERP release → MES dispatch → operator scan → completion → ERP backflush. The most common silent failure is backflush working but cost variance posting to the wrong account when an item master change wasn't synced.

    Export role assignments from ERP, MES, QMS, and the historian. Cross-check against the HR active-employee list — terminated operators with active MES logins are a recurring audit finding. Pay attention to shared-floor accounts: if the kiosk uses one badge for the whole shift, that's a control gap to flag.

    Review CISA ICS advisories and vendor PSIRT bulletins (Rockwell, Siemens, Schneider, Mitsubishi) for the firmware versions in your asset inventory. Do not push firmware updates from this checklist — log affected assets and route through controls engineering for the next planned downtime window.

Data Management

    Confirm last night's full backup completed and the transaction-log chain is unbroken. For SQL Server-backed ERP/MES, a broken log chain means point-in-time recovery is gone — catch it here, not when you need to restore.

    Restore last week's full + log chain into the sandbox instance. A backup that has never been restored is not a backup. Validate by running a known query (open WO count by work center) against both prod and sandbox — counts should match prod as of the backup timestamp.

    Open a P1 ticket with backup-system vendor support and notify the IT manager and CFO. A failed restore on the ERP/MES tier means the plant is one outage away from a multi-day data loss event — treat it like a downed line, not a routine ticket.

    Run the standard item-master audit query: items with no UoM, BOMs with no routing, routings with no work center, items active in ERP but missing from MES. Most of these are launch-day leftovers; a few will be live problems blocking work-order release.

    Confirm the historian (PI, Ignition, Wonderware) is retaining process data for the window your customer or regulator requires — typically 3 years for IATF 16949 traceability, 7 years for FDA 21 CFR Part 11 sites. Watch for tags silently dropping when archive disks fill.

    Pull the slowest-query report from the ERP and MES databases. Production schedulers running the daily WIP report at 6 AM are the canary — if their report takes 4 minutes today and took 30 seconds a quarter ago, an index needs rebuilding or a query needs tuning.

Hardware Maintenance

    Walk the server room and every shop-floor IT cabinet. Look for amber alert LEDs, dust accumulation on intakes, cables draped over hot exhausts, and any cabinet someone has propped open with a folder because it runs hot. Photograph anything that needs a follow-up work order.

    Run the UPS self-test on every unit feeding ERP, MES, historian, and domain controllers. Batteries lose capacity gradually — a unit rated for 20 minutes that delivers 6 won't carry the plant through the time it takes the generator to come up. Manufacturer date over 4 years old is a replacement candidate even if the test passes.

    Open a CMMS work order for the failing UPS, order the replacement battery pack from the approved vendor, and coordinate the swap with production scheduling — UPS battery replacement on a single-feed cabinet requires planned downtime of the systems behind it.

    Check intake temperatures at the server room MDF, every IDF closet, and shop-floor cabinets housing managed switches. Shop-floor cabinets in welding or coating areas run hot in summer — a cabinet that's fine in February may be at 110°F in August. Trend the readings against last quarter.

    Spot-check the operator kiosks, barcode scanners, and rugged tablets on the floor. Common findings: cracked touchscreens taped over, scanners with worn triggers, label printers with the wrong ribbon loaded. Sticky-note workarounds at a station mean a real fix is overdue.

    Summarize this cycle's findings, attach the updated hardware warranty register, and route to the plant manager. Anything flagged 'Fail' or 'Pass with notes' should already have a CMMS or IT ticket open — verify before closing.

Use this template in Manifestly

Start a Free 14 Day Trial
Use Slack? Start your trial with one click

Related Manufacturing Checklists
Related Ot Security Checklists

Ready to take control of your recurring tasks?

Start Free 14-Day Trial


Use Slack? Sign up with one click

With Slack